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A STUDY IN THE PROGNOSIS OF MUSICAL TALENT*} 3 


ELIZABETH MEDERT TAYLOR 
College of Music of Cincinnati 


INTRODUCTION 


The problem.—This study is an attempt 
to evaluate a battery of musical and psycho- 
logical tests as a basis for prognosis of: 
(1) success in a college of music, and (2) 
success in music as a profession. Certain 
minor problems in the relationships of various 
trait measures are also involved. 

Sources of data—A five-year testing pro- 
gram was conducted at the College of Music 
of Cincinnati from 1930 to 1935, employing 
the Seashore Measures of Musical Talent, 
the K—D Music Tests, the Kwalwasser Tests 
of Melodic and Harmonic Sensitivity, the 
Measures of Musical Background (an orig- 
inal test devised for experimental purposes), 
and the Detroit Advanced Intelligence Test, 
Forms V and W. These tests constitute the 
total battery to be evaluated. The data used 
in determining the validity of the tests in- 
cluded marks of students in certain college 
courses in music, and judgments by compe- 
tent persons upon the professional success of 
the students as musicians. In 1939 the 185 
students participating in the testing program, 
at that time having been out of school three 
to seven years, were traced and additional 
data concerning them were gathered through 
questionnaires and personal contacts. Further 
data were secured from questionnaires admin- 
istered to the students at the time of their 
entrance into the College of Music, and from 
ratings of the students on certain traits, made 
at the end of the freshman year by their in- 
structors in their major fields of applied 
music. 

Scope-—As suggested in the preceding 
paragraph, the tests have been checked 
against two types of criteria: (1) success in 
college courses in music, and (2) success in 
music as a profession. In the first instance, 


* Abstract of a Doctor’s dissertation, University of Cincin- 
nati, 1941. The advice of Dr. Gordon H ickson was 
especially helpful in planning and executing the study. 


he marks in courses furnished the major 
criteria. These were supplemented by the 
trait ratings made by teachers of applied 
music (piano, voice, etc.). In the second in- 
stance, graduates and former students of the 
College of Music were classified by competent 
judges as to degree of professional success. 

These two attempts to determine the valid- 
ity of the various tests in the battery con- 
stitute the main part of this investigation. 
However, interesting supplementary data 
were secured, incidental to the attack upon 
the problem of test validity. These data made 
possible an investigation of certain minor 
problems in the following fields: (1) charac- 
teristics of students entering the College of 
Music of Cincinnati as freshmen from 1930 
to 1935; (2) the predictive power of course 
marks for professional success; (3) the rela- 
tionship of intelligence test scores to music 
test scores; (4) the relationship of intelli- 
gence test scores to professors’ estimates of 
ability; (5) the relationship of scores on 
music tests in certain testing areas (such as 
pitch or rhythm) to scores on other tests in 
the same areas; (6) the intercorrelation of 
marks in college courses in music; and (7) 
the relationship of professors’ estimates of 
ability to marks in college courses. 

Controversial literature on the problem— 
The validity of standardized music tests has 
been subjected to discussion ever since their 
inception, psychologists and musicians hav- 
ing both defended and disparaged them. 
Brief quotations from prominent participants 
in this controversy may serve to indicate the 
need for further investigation of the subject. 

James L. Mursell of Columbia University, 
one of the opponents of the existing battery 
of standardized music tests, states: 


Psychologists of the highest repute, such 
as Ogden, Watt, Revecz, and Farnsworth, 
have carefully and conclusively shown that 
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existing music tests are open to fundamen- 
tal criticism, and must be used with much 
reserve and care.’ 


Refuting an often-repeated accusation that 
he is essentially “hostile” to music tests of all 
kinds, Mursell expresses a desire for a sound 
prognostic measure: 


It would be a research instrument which 
we sorely need, and also an invaluable 
agency for educational guidance. But I am 
not convinced that any such exist, although 
I believe that they could be developed .. . 
In many lines of educational work we can 
never hope for exact demonstration. But 
this is not so with tests. Tests are capable 
of exact, complete, and unanswerable anal- 
ysis, and we ought to insist that they re- 
ceive it before we accept them, and, still 
more, before we apply them and proceed 
to offer guidance on their results. We have 
every right—nay, we have a positive duty 
—to demand stringent proof that any given 
test will really do what it promises. And 
my objection to the existing music tests is 
very simple. They have never wana prover 


up.” 


Concerning the validity of existing tests of 
musical talent, Mursell contends: 


The woods are full of published tests 
with dishonest titles . . . We think the 
Terman Group Test of Intelligence really 
does measure something—though just what 
it is we know not—called general intelli- 
gence not because Dr. Terman says it does, 
but because Dr. Terman has developed a 
proof of his claim, and because he has pub- 
lished his proof so that we can study it and 
form our own judgment. What then about 
the music tests? After a careful examina- 
tion of all the research studies I have been 
able to find, and they are not few, I am 
compelled to the opinion that in the case 
both of the Seashore Measures of Musical 
Talent and the Kwalwasser—Dykema Music 
Tests such proof is entirely lacking.’ 


Speaking further of the value of the tests, 
Mursell holds that the only valid method of 
finding whether a given test measufes what 
it purports to measure is 


1 James L. Mursell, “What About Music Tests?” Music 
Educators Journal, XXIV (October-November, 1937), 16. 


2 Ibid., p. 16. 
8 Ibid., p. 16. 
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“to ascertain whether persons rating high 
or low or medium on these tests also rate 
high and low and medium in what one 
may call ‘musical behavior’.”* 


Since the results of Stanton’s Eastman Ex- 
periment and the results of other investiga- 
tions of the Seashore battery fail to validate 
the measures, Mursell reaches the conclusion 
that 


“the relationship of the tests to musical 
talent.as this manifests itself in the types 
of musical behavior studies is not 
proved.’”® 


Carl E. Seashore of the University of 
Iowa, a proponent of the tests, answered 
Mursell’s propounded omnibus theory of 
“blanket rating against all musical behavior’® 
by designating his own divergent system of 
thought as the theory of specific, isolated 
factors in measurement. 


They (the tests) have been validated 
for what they purport to measure. This is 
an internal validation in terms of success 
in the isolation of the factor measured and 
the degree of control of all other factors in 
the measurement ... They should not 
be validated in terms of their showing on 
an omnibus theory . Validation of 
pitch against the violinist’s artistic per- 
formance in the actual musical situation 
would require that we correlate the sense 
of pitch with objective records of musical 
performance in pitch intonation or ability 
to hear artistic pitch deviation in the musi- 
cal situation—not with the countless other 
merits or demerits that the violinist may 
exhibit." 

Thus, Seashore considers musicality to con- 
sist of a large number of isolated, limited 
traits, rather than to appear as one single 
“omnibus” factor in the musical tempera- 
ment. 

Another staunch supporter of the tests, 
Jacob Kwalwasser of the University of 
Syracuse, attempted to answer some of Mur- 
sell’s arguments. 


The specific weakness is the uncontrolled 
variable. How does Dr. Mursell know that 
the low eembenete: are due to the unre- 
4 Ibid., p. 

Fe “The Psychology of Music,” M 
Biucotere Jomndl XXIV (December, 1937), 25-26. ~— 
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liability of the tests? The only value of 
great unreliability to which he closes his 
eyes, is contained in the column marked 
criterion. How did our critic prove up this 
column? Just how accurate are teachers’ 
grades ?® 


Kwalwasser points out the great need for 
further research in the science of music 
pedagogy. 

To encourage testing and measuring is to 
encourage a problem solving attitude on 
the part of the music teaching profession; 
to discourage testing and measuring is to 
return to a state of empirical chaos and 
ignorance. Let there be more light!® 


Review of testing in music—Objective 
testing in music has trailed that in other 
fields, possibly due to the difficulty of the 
medium and to the disinclination of the 
artistic to accept the theory of tests. But 
from the beginning of the third decade of the 
present century to the present, the increased 
volume of such tests has been astonishing. 
A bibliography” issued in 1936 lists almost 
fifty tests, and others have appeared since 
that date. 

Twelve prognostic tests in music have been 
developed. They attempt to measure musical 
talent in terms of sensory elements. Two of 
them (Seashore Measures of Musical Talent 
and Kwalwasser-Dykema Tests) have re- 
ceived major attention in the work of in- 
vestigators. 

These studies have investigated the uses of 
tests for prognosis of musical ability, the 
relation of intelligence to musical achieve- 
ment, relationship of test scores to teachers’ 
ratings, and other topics. Such experiments 
have been carried on by Bogen, Brennan, 
Broom, Chadwick, Dean, Emerick, Farns- 
worth, Fracker and Howard, Gaw, Highsmith, 
Lanier, Larson, Loudon, McGinnis, Mursell, 
Salisbury and Smith, Stanton, Tilson, More, 
Whitely, Manzer and Marowitz, Barnard, and 
C. H. Taylor. 

Numerous investigators have reported on 
the reliability and validity of prognostic tests. 
in most cases they have estimated both reli- 


ee Kwalwasser, “From the Realm of Guess into the 

Realm of Reasonable Certainty,”” Music Educators Journal, 
XXIV —- 1938), 16. 

* Ibid., p. 

» Cecile Ww. lemming, Marion Flagg, and others, A 

ibli. of Pr tic and Achievement 

; ew York: Bureau of < we Teachers 

College, Columbia ‘University, 1936. Pp. 
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ability and validity to be low. These studies 
have been made by Brown, Farnsworth, 
Drake, Gilliland and Jensen, Heinlein, Lar- 
son, Stanton, Mursell, and E. M. Taylor. 


THe MEasuRES EMPLOYED IN THIS STUDY 


Seashore and Kwalwasser—Dykema tests.— 
Detailed descriptions of the Seashore and 
Kwalwasser-Dykema Tests are available in 
many publications.“ The original version 
(1919) of the Seashore Tests was the one 
used in this study. 

The measures of musical background.— 
This is a group of four tests devised by Cor- 
win H. Taylor for experimental purposes at 
the College of Music of Cincinnati. The 
description that follows is of the original 
form used in this study, rather than of the 
more recently made revision.’” 

In the Discrimination of Mode Test the 
student is asked to determine the mode 
(major or minor) of harmonized and unhar- 
monized melodies, and chords in various 
positions and inversions, played upon the 
pianoforte. 

In the Detection of Rhythm Test the 
student is asked to determine the number of 
beats in a measure in twenty selections played 
upon the pianoforte. 

In the Discrimination of Mood Test the 
student is given twenty sets of five words, 
each set describing a feeling or emotion. A 
selection is then played upon the pianoforte 
and the student is asked to determine which 
word most adequately describes the mood of 
the selection. 

In the Recognition of Themes Test, twenty 
selections are played upon the pianoforte. 
The student is asked to give the complete 
name, and the composer or source of each 
selection. 

Intelligence test—The Detroit Advanced 
Intelligence Test?* was part of the battery 
used in this study. 

College courses —Marks in courses offered 
by the College of Music of Cincinnati in 
dictation, sight singing, harmony, and history 
of music furnished one of the types of cri- 


11 James L. Mursell, Psychology of pit, BR; 289-290, 
323-324. New York: W. W. Norton and Co., 1937. 

Pog further information regarding the revised test, see 
t one, by Corwin H. 
idation of Certain Experi- 

rR. of it Poathilty. ”” Unpublished 
Dovtr’s dissertation, University of Cincinnati, 1941. 
18 Harry J. Baker, Manual of Directions, Detroit Advanced 
Intelligence Test Forms V and W pp. 13-15. Bloomington, 
Illinois: Public School Publishing Company, 1924. 
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teria used in checking the validity of the 
various music tests. 


Professors’ estimates of ability—Profes- 
sors’ estimates of ability, collected for this 
study from the records of the College of 
Music of Cincinnati, were made at the end 
of the freshman year. For each student, the 
ratings were made by the professor under 
whom he had worked in his major field of 
applied music (piano, voice, etc.). These 
estimates were never made by the teachers 
of any of the courses discussed in the pre- 
ceding section. Students were rated on a six- 
point scale (Superior—A, Very Good—B, 
High Average—C+, Low Average—C—, 
Poor—D, Very Poor—E). Items on which 
students were rated included: Musical Talent 
(inborn capacity for musical achievement) ; 
Musical Feeling (artistic temperament, cre- 
ative imagination, and initiative in interpre- 
tation); Technique (mechanical ability in 
performance); Rhythmic Action (ability for 
rhythmic expression in playing or singing) ; 
Quality of Tone (in playing or singing); 
Application (effort, faithfulness in practice, 
sustained interest and attention); and 
Achievement (progress in technique and 
musical expression). 

Questionnaires —A questionnaire, required 
of every student at the beginning of his 
freshman year, asked pertinent questions re- 
garding: socio-economic background (nation- 
ality of parents, occupation of parents, num- 
ber of children in the family, etc.); musical 
background (musical instruments studied, 
where and when studied, and other previous 
musical training and experience); status at 
entrance (outside occupation other than 
music, and degree, diploma, or certificate 
intentions). 

A second questionnaire, devoted primarily 
to professional or other career after leaving 
the College of Music, was sent in October, 
1939, to the subjects of this investigation. 


CHARACTERISTICS OF 150 STUDENTS ACHIEV- 
ING FRESHMAN STATUS AT THE COLLEGE 
oF Music or CINCINNATI FROM 


1930 TO 1935 


The present section provides a general 
background of information as to the subjects 
of this study. It is of importance chiefly be- 
cause of the need for an adequate description 
of the students whose achievement in college 
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and in the field of music furnishes the criteria 
for evaluating the tests studied. 

The following statements are based upon 
data derived from questionnaires answered by 
students of the College of Music of Cincin- 
nati at the beginning of their freshman year, 
and three to seven years after their departure 
from college. 


1. Sixty-eight per cent of the students 
stated the nationality of their parents as 
American, fourteen per cent stated it as 
German. The remainder mentioned various 
other nationalities. 

2. Around ninety per cent of the students 
came from middle and upper-middle class 
homes in which the father was a professional 
man and the mother a housewife. The typical 
family included three children, although eight 
per cent had more than five children, and 
twenty per cent of the students were only 
children. 

3. Thirty-five per cent of the group had no 
college graduates, professional musicians, or 
music teachers in their immediate families; 
thirty-five per cent had college graduates but 
no one in either of the other categories. 


4. Thirty per cent of the students came to 
college the year followifig their graduation 
from high school; more than half of the re- 
mainder were out of high school two to three 
years before achieving freshman status; eight 
per cent of the entire group were out more 
than ten years. 

5. Three-fourths of the students possessed 
no previous academic training; of the re- 
mainder, almost thirty-five per cent had train- 
ing of one year, one-fifth had two years, and 
one-ninth had four years. 

6. One-fifth of the students had studied 
voice one to two years before coming to col- 
lege; almost all of the remainder had had no 
voc: ‘ training. 

7. Almost one-third of the instrumental 
students had studied five to six years; one- 
fifth had studied from ten to seventeen years; 
one individual had no previous instrumental 
training. 

8. Four-fifths of the students had studied 
piano before coming to college; one-fifth had 
studied organ; more than one-fourth had 
studied violin; tympani, drums, trombone, 
and double bass had been least frequently 
studied. 











\- 


) 
e 


~~ = os ee OP et ees YD uwemnmm 


~~” 


SS SS TS !)h hel oe 





September, 1941] 


g. Almost one-half of the students played 
one instrument only; forty per cent played 
two instruments; ten per cent played three or 
more instruments. 

10. More than one-third of the students 
had other musical training; more than three- 
fourths had other musical experience; one- 
fourth pursued occupations other than music. 

11. Almost one-third of the students never 
received an honor or degree; more than one- 
third received a bachelor’s degree; thirty per 
cent received a certificate or diploma; ten per 
cent received two bachelor’s or a master’s 
degree. 

12. Thirteen per cent of the students be- 
came full time professional performers; an 
equal number ended their musical careers 
with marriage; more than one-third became 
teachers; approximately two-thirds continued 
to regard music as their major interest. 

In interpreting these statements, it must 
be remembered that the period of time cov- 
ered by the statistics includes the so-called 
“depression years.” 

The results, as they stand, record the per- 
formances of only the one hundred and fifty 
students who matriculated as freshmen at the 
College of Music of Cincinnati from 1930 to 


1935- 


THE VALIDITY OF CERTAIN MEAS- 
URES FOR PROGNOSIS OF 
COLLEGE SUCCESS 


INTRODUCTION 


This section is concerned with the problem 
of the validity of four batteries of music tests 
and of an intelligence test for the prognosis 
of success in a coliege of music. The criterion 
of college success was considered to be the 
ability to succeed in the college courses of 
dictation, sight singing, harmony, and history 
of music. These courses were selected on the 
grounds that dictation and sight singing may 
be classified as purely musical studies, har- 
mony as a musical-intellectual study, and 
history of music as a purely intellectual study. 
The fundamental data consist of coefficients 
of correlation, computed by the Pearson 
product-moment method, between marks in 
these courses and scores on the music and 
intelligence tests. 

College courses——Dictation and sight sing- 
ing, taught in some schools as a combined 
class called ear training, are taught as sepa- 
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rate subjects at the College of Music of Cin- 
cinnati, and their marks are recorded sepa- 
rately. Ability in dictation and sight singing 
generally is considered to be most indicative 
of musicality (more studies have used these 
subjects as criteria than any other subject). 
Mursell quotes Vidor to the effect that 


“musicality . expresses itself in the 
ability to apprehend coordinated musical 
structures and to reproduce or produce 
them.’”*4 


He adds, on his own account: 


A higher level of musicality manifests 
itself with the power to perceive and re- 
spond to pitch and interval relationship. It 
should be observed that the essential factor 
here is not pitch discrimination . . . but 
rather the awareness of relatedness among 
tones—a perceptive, not a sensory ability. 
Such an awareness can be carried to an 
almost unlimited degree of refinement and 
precision, and is probably the most char- 
acteristic mark of the supremely musical 
individual.*® 


The study of harmony has long been a 
basic one in every branch of musical endeavor. 
Since it has held, and holds now, this impor- 
tant place in the curriculums of all music 
schools, it seems safe to assume that music 
educators consider the ability to write har- 
mony well, as an index of musical ability. 
Certainly, the student who is unable to pass 
successfully this introductory theoretical 
course has little chance of graduation. Since 
it is known that some students can, and do, 
write harmony by rule, and with complete 
lack of auditory imagery which is so neces- 
sary in writing dictation, it is probably true 
that ability in harmony is much less indica- 
tive of musicality than is ability in dictation. 
However, success in harmony should provide 
one significant criterion of musical ability. 

Since history of music has been classified 
earlier as an intellectual subject, its use as a 
criterion of musicality may seem somewhat 
doubtful on the surface. Its inclusion may be 
justified by the fact that, like harmony, it 
forms a part of the curriculums of every 
music school. Granted that ability in history 
of music has little direct connection with 
ability in music in general, it nevertheless 


4 James L. Mursell, The Psychology of uaa p. 325. 
NG bab. I Norton and Company, 1937 
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seems worthwhile to discover what correlation 
may exist between achievement in history of 
music and the various measures employed in 
this study. 

Two factors that materially affect the re- 
sults of the computations, and consequently 
their interpretations, should be considered in 
the use of these college subjects as criteria 
for college success in music. These factors 
are the marking system and the distribution 
of marks. 

Marking system.—In order to make clear 
the significance of the marks in the school 
subjects, there follows a description of the 
marking system of the College of Music. 
Letter marks are assigned as follows: 94—100 
(A), 90-93 (A—), 87-89 (B+), 84-86 
(B), 80-83 (B—), 77-79 (C+), 74-76 (C), 
70-73 (C—), 67-69 (D+), 64-66 (D), 
60-63 (D—), 50-59 (D, Condition), below 
50 (F, Failure). Marks in sight singing, har- 
mony, history of music, and the professors’ 
estimates are given in letters. For computa- 
tion, it was necessary, therefore, arbitrarily 
to assign numerical values to these letters: 
A (95), A— (92), B+ (88), B (85), B— 
(82), C+ (78), C (75), C— (72), D+ (68), 
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Mark distributions.— Figure 1 presents 
graphically the distribution of marks in dic- 
tation and sight singing, and Figure 2 the 
distribution of marks in harmony and history 
of music. It may easily be seen that the 
mark distributions of the four college subjects 
vary greatly. The distribution for dictation 
ranges from zero to one hundred. Nineteen 
scores fall below sixty (the passing mark) ; 
the rest are well distributed over the A, B, C, 
and D mark ranges, with an approximately 
equal number in the A, B, and C groups. The 
marks in sight singing range from fifty-five 
to ninety-six, a smaller range with no low 
extremities, since number marks were trans- 
muted from letter marks as described above. 
As is the case. with the distribution for dicta- 
tion, the sight singing marks are well dis- 
tributed, but with the greatest number at B 
or eighty-five. The distribution for harmony 
presents quite a different picture, the range 
being from sixty-two to ninety-six with all 
but twenty-one of the marks being eighty-five 
or above. The history of music distribution is 
similar to that for harmony, only eleven scores 
falling below eighty-five. 


In general, the marks in sight singing and 
dictation are distributed widely, those in har- 
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mony and history of music, narrowly. It does 
not seem probable that abilities in these four 
subjects are distributed in such varying 
fashion. Dictation and sight singing are 
taught by one teacher, and harmony and 
history of music by another. These facts are 
noted, not in criticism of the marking in any 
particular subjects, but because the correla- 
tion coefficients undoubtedly are affected in 
size by the variability of the data from which 
they are computed. When variability in one 
or both measures becomes very small, the 
coefficient will approach zero. 

In the following section the college courses 
in dictation, sight singing, harmony, and 
history of music will be discussed in turn. 


THE PREDICTIVE VALUE OF T'EST SCORES FOR 
SPEciFIc COLLEGE COURSES 


In determining the value of scores on tests 
for the prediction of success in college courses, 
coefficients of correlation were computed be- 


tween the test scores and course marks by the 
Pearson product-moment method. 

Interpretation of coefficients of correlation. 
—In order to assist in the interpretation of 
coefficients of correlation, the following state- 
ment is quoted from Garrett: 

In the field of mental measurement it is 
customary to describe the correlation be- 
tween two tests in a general way as being 
high, marked or substantial, low, or neg- 
ligible. While the descriptive label applied 
will vary somewhat in meaning with the 
author using it, there is fairly good agree- 
ment among workers with psychological 
and educational tests that an 
r from .00 to +.20 denotes indifferent or 

negligible relationship; 

r from +.20 to +.40 denotes low correla- 
tion; present but slight; 

r from +.40 to +.70 denotes substantial or 
marked relationship; 

r from +.70 to £1.00 denotes high to very 
high relation. 
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This classification is broad and somewhat 
tentative, and can be accepted as a general 
guide only in the light of various qualifica- 
tions. A coefficient of correlation should 
always be evaluated with regard to (1) the 
nature of the material dealt with; (2) 
P E.; (3) the size and variability of the 
group .. .; (4) the reliability coefficients 
of the tests . . .; and (5) the purpose for 
which the r was computed In the 
field of vocational testing, the r’s between 
test batteries and measures of aptitude rep- 
resented by various criteria are rarely above 
.50 ..., and r’s above this figure would 
be considered surprisingly high.*® 


Correlation between marks in dictation 
and scores on music tests—Table I presents 
the data resulting from the correlation of dic- 


TABLE I 


COEFFICIENTS OF CORRELATION BETWEEN 
MARKS IN DtcTATION AND SCORES 
on Music TEsTs 


Measures r PE N 
Background Discrimination 

lll .645 .0383 145 
K-D Pitch Imagery-----_---- .593 .0386 144 
K-D Tonal Memory-_------ .445 .044 147 
Background Rhythmic 

Discrimination. ________-- .3887 .047 146 
Seashore Sense of Intensity.. .208 -~-@47 1651 
K-D Intensity Discrimina- 

ES ae .2938 .051 147 
Seashore Sense of Time__... .272 .051 151 
Seashore Sense of Memory... .268 .051 147 
Seashore Sense of Consonance .257 .051 147 
K-D Rhythm Imagery _- - - -- .256 .053 145 
Seashore Sense of Rhythm - _- 214* .057 147 
K-D Quality Discrimination. .211* .059 147 
Kwalwasser Harmonic 

so On .208* .053 147 
K-D Melodic Taste_____.-_- —.127* .055 147 
K-D Tonal Movement. ----- .102* .055 147 
K-D Rhythm Discrimination 094* .055 147 
K-D Time Discrimination... .064* .056 148 
K-D Pitch Discrimination... .064* .056 148 
Kwalwasser Melodic 

REESE .058* .055 149 
Seashore Sense of Pitch_ - - -- 027* .055 150 


* Coefficients less than four times their probable 
errors. 


tation marks with test scores. The coefficients 
are arranged in order of size, from the largest 
to the smallest without regard to sign. Since 
all students did not take all of the tests, N 
is variable. 

%* Henry E. Garrett, Statistics in Psychology and Educa- 


tion, pp. 342-343. New York: Longmans, Green, and Com- 
pany, 1937. 
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Coe ficients denoting substantial relation- 
ship—Of the twenty tests correlated with 
dictation, the following three fall in Garrett’s 
third category, indicating substantial or 
marked relationship; Background Discrimina- 
tion of Mode, K—D Pitch Imagery, and K—D 
Tonal Memory. 

The Background Discrimination of Mode 
Test measures an ability which, to the 
knowledge of the writer, has not been utilized 
up to this time. The ability to distinguish a 
major melody from a minor one, and a major 
chord in various positions and pitches from a 
minor one, may not, in itself, be fundamental 
to dictation; yet the individual with sufficient 
keenness of ear to score high on such a test 
possesses the perceptual ability mentioned by 
Mursell to a high degree, and is, therefore, 
the one who is able to apply it in dictation. 

Since dictation requires auditory imagery, 
it seems not illogical that scores on the K—-D 
Pitch Imagery Test should correlate highly 
with scores in dictation. Although Mursell in 
his discussion of the K—D tests doubts the 
value of the technique, the evidence points 
conclusively to the intrinsic relationship. 

There is a marked decrease in the size of 
the third coefficient, that between scores in 
K-—D Tonal Memory and marks in dictation. 
The relatively high coefficient for this test, 
which has been proved in other studies to be 
the best of the K—D group,” bears out the 
obvious close relationship between dictation 
and tonal memory. 

Coefficients denoting slight relationship.— 
A middle group of tests indicating but slight 
relationship to ability in dictation includes 
the following: Background Rhythmic Dis- 
crimination, Seashore Sense of Intensity, 
K-D Intensity Discrimination, Seashore 
Sense of Time, Seashore Tonal Memory, Sea- 
shore Sense of Consonance, K—-D Rhythm 
Imagery, Seashore Sense of Rhythm, K—D 
Quality Discrimination, and Kwalwasser Har- 
monic Sensitivity. 

Since K—D Pitch Imagery scores yielded 
the relatively high coefficient of .593 when 
correlated with dictation marks, it seems in- 
consistent that scores: in K-D Rhythm 
Imagery, also essential to skill in taking dic- 
tation, should yield a coefficient of only .256. 
It may be that the test itself is inferior 
rather than that rhythm is less important in 


taking dictation than is pitch. 
1/ Max Schoen, The Psychology of Music, p. 188. New 
York: Ronald Press Company, 1940. 
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Other tests that should correlate highly 
with dictation because of the abilities they 
ostensibly measure are: Background Rhyth- 
mic Discrimination (this coefficient is higher 
than that of any other rhythm test and ranks 
first in the middle group), Seashore Sense of 
Time, Sense of Rhythm, and Tonal Memory, 
the latter being especially conspicuous in view 
of the fact that K-D Tonal Memory stood 
third in the high group. 

The coefficients of the three tests that are 
lowest in the group are found to be less than 
four times their probable error, and, there- 
fore, are not significant. In Garrett’s terms, 


“to be reasonably sure that at least some 
degree of correlation greater than zero is 
present, an obtained r should be four times 
its P £.’"* 


Coe ficients denoting negligible relationship. 
—The remaining seven tests denote negligible 
relationship in dictation marks, since they, 
like the three mentioned in the preceding 
paragraph, have P E’s too large to yield sig- 
nificant r’s. These tests are: K—D Tonal 
Movement, K—D Rhythm Discrimination, 
K-—D Time Discrimination, K—D Pitch Dis- 
crimination, Kwalwasser Melodic Sensitivity, 
Seashore Sense of Pitch, and K—D Melodic 
Taste. That such requisites of dictation abil- 
ity as pitch, time, and rhythm should produce 
low coefficients can mean only that, though 
the tests may measure discrimination, it is 
through artificial techniques. Certainly, it is 
not of the type that functions in dictation. 
The last named test, K—D Melodic Taste, 
appears to correlate negatively with dictation, 
yet it also gives a coefficient too low to be 
reliable. The normal expectation is for a low 
r, since ability to distinguish the better of 
two melodies is hardly necessary to ability in 
writing them. 

Summary for dictation—Of the twenty 
tests, one-half are statistically reliable and 
indicate some degree of predictive power. 
Only three offer substantial prediction of 
success in dictation. These three, in order of 
the size of their coefficients, are: Background 
Discrimination of Mode, K—D Pitch Imagery, 
and K-D Tonal Memory. 

Correlation between marks in sight singing 
and scores on music tests ——Table II lists the 
coefficients of correlation between scores on 
various parts of the test battery, and those 

18 Henry E. Garrett, op. cit., p. 281. 
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TABLE II 


COEFFICIENTS OF CORRELATION BETWEEN 
MARKS IN SIGHT SINGING AND 
ScoRES ON Music TESTS 


Measures r PE N 
Background Discrimination 

} Sei So ee . 507 043 144 
K-D Pitch Imagery_______-- .848 .049 142 
Seashore Sense 4 of ne. .38381 .049 149 
K-D Tonal Memory. ----.- .286 .052 145 
Seashore Memory-_--_-__----- . 233 053 146 
K-D Pitch Discrimination...—.184* .054 145 
Seashore Sense of Time_-_-_- .171* .054 148 
K-D Rhythm Discrimination .166* .055 145 
K-D Intensity Discrimina- 

PR ROA ae .148* .057 145 
Seashore Sense of Rhythm... .142* .054 147 
Seashore Sense of Pitch__--- 121* .055 147 
K-D Quality Discrimination. .109* .056 145 
K-D Rhythm Imagery-_---_- .089* .057 145 
K-D Time Discrimination__..—.076* .056 145 
K-D Melodic Taste_________ .065* .056 145 
K-D Tonal Movement__-___- .064* .056 145 
Kwalwasser Melodic 

a a eee .064* .055 148 
Background Rhythmic 

Discrimination_-_______-.-- —.029* .056 145 
Seashore Sense of Consonance—.053* .055 146 
Kwalwasser Harmonic 

tha in a5: wc ties —.045* .055 146 


* Coefficients less than four times their probable 
errors. 


in sight singing. A general comparison of this 
distribution with that for dictation reveals 
that the majority of measures fall in the same 
relative position, although the entire range is 
lower. 

Coefficients denoting substantial relation- 
ship—Only one test indicates marked rela- 
tionship with sight singing marks, namely, 
Background Discrimination of Mode. The co- 
efficient, in this case, is lower than that for 
dictation, showing that keenness of percep- 
tion for tones is also required for sight sing- 
ing, but to a lesser degree. 


Coefficients denoting slight relationship — 
The middle group denoting slight relationship 
includes: K-D Pitch Imagery, Seashore 
Sense of Intensity, K-D Tonal Memory, and 
Seashore Tonal Memory. With the exception 
of Seashore Sense of Intensity and Seashore 
Tonal Memory, these measures fall in the 
high group for dictation. This would seem to 
indicate that although the same abilities are 
desirable or are as necessary to sight singing 
as they are to dictation, their function is 
minimized in the former. 
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Coe ficients denoting negligible relationship. 
—Fifteen measures of the total of twenty 
show negligible relationship to sight singing: 
K-D Pitch Discrimination, Seashore Sense 
of Time, K—-D Rhythm Discrimination, K—D 
Intensity Discrimination, Seashore Sense of 
Rhythm, Seashore Sense of Pitch, K-D Qual- 
ity Discrimination, K-D Rhythm Imagery, 
K-D Time Discrimination, K—D Melodic 
Taste, K-D Tonal Movement, Kwalwasser 
Melodic Sensitivity, Background Rhythmic 
Discrimination, Seashore Sense of Conso- 
nance, and Kwalwasser Harmonic Sensitivity. 
The probable errors computed for these co- 
efficients reveal that it is possible that no 
correlation greater than zero exists. As stated 
in the discussion of dictation, these tests must 
not be measuring the type of pitch, rhythm, 
etc., that actually functions in a true musical 
situation such as exists in sight singing. Fur- 
thermore, this indication of low validity sub- 
stantiates the findings of other investigators. 

One other comparison of coefficients is 
worthy of note. Background Rhythmic Dis- 
crimination scores correlate .387 with dicta- 
tion marks, and —.o29 with sight singing 
marks. On the surface, this seems a serious 
discrepancy. However, when analyzed, it is 
clear that aural recognition of meter is a 
necessity in taking dictation, but that the 
reproduction of the written symbols of meter 
which functions in sight singing may be quite 
another and more easily acquired ability. 

A further reason, entirely apart from those 
stated above, for the decrease in size of the 
coefficients for sight singing may be connected 
with the marking system. For purposes of 
calculation, numbers were assigned to the 
letter marks as described earlier in this chap- 
ter, thus causing the scores to pile up around 
certain points; eg., A (95), B (85), etc. 
Dictation marks, however, having been re- 
corded originally in numbers (marks which 
students actually earned), are spread more 
evenly throughout the range than are the 
sight singing marks. Examination of the dis- 
tributions, which appear in Figure 1, will 
clarify this point. 

Summary for sight singing —Only one test, 
Background Discrimination of Mode, indi- 
cates substantial relationship to sight singing, 
but its value for prediction of success in sight 
singing is much less than its prediction of 
success in dictation. Tests in the distribution 
for sight singing retain the same approximate 
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relative position as for dictation, though the 
coefficients in every case are smaller. 


Correlation between marks in harmony and 
scores on music tests —Table III contains the 
data from the correlation of harmony marks 
with test scores. It is immediately obvious 
that the total range of r’s is less than half the 
ranges of the dictation coefficients and of the 
sight sjnging coefficients. There is little to 
comment upon when all coefficients are so 
low as to be almost insignificant. 


One measure, Seashore Sense of Rhythm, 
falls in the slight relationship group, none in 
the marked group. The remainder evidence 
negligible relationship, judged both by Gar- 
rett’s characterization and by the P. E.’s cal- 
culated. Examination of the whole distribu- 
tion indicates that if the Seashore tests 
possess any predictive value for marks in har- 
mony, the K—D tests reveal less. While part 
of the cause for the low correlations with har- 
mony may be the inability of the tests to 
measure ability as it actually functions in 
harmony, nevertheless, the computation must 
be severely affected by the fact, mentioned 
earlier, that the variability of the marks in 


TABLE III 


COEFFICIENTS OF CORRELATION BETWEEN 
MARKS IN HARMONY AND SCORES 
ON Music TESTS 


Measures Tr PE N 


Seashore Sense of Rhythm... .269 .052 149 
Kwalwasser Melodic 

Sa .178* .054 149 
Seashore Tonal Memory ---- 157* .054 148 
Background Discrimination 

(i Aa .1389* .055 145 
Seashore Sense of Consonance .101* .055 148 
Seashore Sense of Pitch __-__ —.076* .050 149 
K-D Intensity Discriminat- 

anne a Sa Rete ie —.075* .055 147 
Seashore Sense of Time_-- _- —.057* .055 150 
Kwalwasser Harmonic 

OS ES .056* .056 147 
K-D Rhythm Discrimination —.040* .055 147 
K-D Melodic Taste___.____- —.033* .056 147 
Seashore Sense of Intensity... .021* .055 150 
K-D Pitch Discrimination... .021* .056 147 
K-D Tonal Memory----_---- .020* .056 146 
Background Rhythmic 

Discrimination. -________-- —.019* .056 146 
K-D Rhythm Imagery- ----- .017* .057 144 
K-D Pitch Imagery-_-_--. -- - —.008* .057 144 
K-D Quality Discrimination.—.008* .056 147 

- i Dieatiabaation _. .007* .056 147 
K-D Tonal Movement- ----- .004* .056 147 


* Coefficients less than four times their probable 
errors. 
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harmony is small (see Figure 2). Certainly, 
it appears that some tests having a relation- 
ship to harmony, judging at least by their 
designated titles, should yield higher r’s. It 
seems reasonable to assume, in the light of 
the general evidence on individual differences, 
that the marks in harmony are not distributed 
in accordance with the actual distribution of 
abilities. Hence, the correlations, while accu- 
rate as far as the data go, are not the same 
as would be obtained for a true distribution 
of abilities. 


Correlation between marks in the history 
of music and scores on music tests —Table IV 
presents the coefficients of correlation between 
history of music marks and test scores. Many 
tests of the battery obviously have no direct 
connection with history of music, but it was 
felt that, in order to make the records com- 
plete, computations should be made for all 
of them. 

It is evident that the low correlation co- 
efficients can not be attributed entirely to the 
inadequacy of the measures employed, but 
rather, at least partially, to the narrowed dis- 


TABLE IV 


COEFFICIENTS OF CORRELATION BETWEEN 
MARKS IN HISTORY OF MUSIC AND 
ScorEs ON Music TESTs 


Measures r PE N 
K-D Rhythm Imagery......—.416 .054 108 
K-D Time Discrimination... .277 .059 110 
K-D Melodic Taste_________ .254 .060 110 
Kwalwasser Harmonic 

Sensitivity..............- .238* .061 110 
K-D Tonal Memory-_---_-_-_-- .2138* .062 110 
Seashore Sense of Time--_-__- .128* .063 113 
K-D Quality Discrimination.—.126* .063 110 
K-D Intensity Discrimina- ; 

ee ae eee .119* -.063 110 
Seashore Sense of Rhythm... .095* .064 112 
K-D Pitch Discrimination. ..—.076* .064 110 
Background Rhythmic 

Discrimination_____....-- —.067* .064 110 
Seashore Sense of Memory _.—.066* .064 111 
Seashore Sense of Pitch _--_- —.063* .063 112 
K-D Rhythm Discrimination —.061* .064 110 
Seashore Sense of Consonance—. 055* .064 111 
Kwalwasser Melodic 

| re .053* .064 111 
Seashore Sense of Intensity.. .048* .063 113 
Background Recognition of 

te he deat inde .047* .064 111 
Backgreund Discrimination 

"3 “Spee .046* .064 111 
K-D Tonal Movement. --_-_- —.024* .064 110 
K-D Pitch Imagery-_-_--_-_--- —.014* .064 107 


* Coefficients less than four times their probable 
errors. 
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tribution of marks (see Figure 2). Only three 
coefficients are four times their probable 
errors; namely, K-D Time Discrimination, 
K-D Melodic Taste, and K-D Rhythm 
Imagery. 

The unusual r of —.416 for K—-D Rhythm 
Imagery is impossible to explain logically. It 
seems probable that its size, at least, is due 
to chance. The original data and the compu- 
tations have been checked and re-checked 
with the same results. The r for K-D Melodic 
Taste scores and history of music marks is 
markedly higher than the same test and any 
other subject. Perhaps the student who has 
the superior background to make the kind of 
judgment required for this test is the one who, 
because of that background, is eager to learn 
more about music and to hear more of it; 
therefore, he is able more easily to earn high 
marks in history of music. 


One test not used in other computations, 
and therefore not mentioned before, is the 
Background Recognition of Themes. This 
was included because it was felt that the 
student with broad background might have 
interests that would aid in learning history 
of music. The negligible r of .047, however, 
does not bear out this hypothesis, although 
it does not disprove it. As can easily be seen 
by the simple examination of many of the 
test titles, there is no obvious reason for 
expecting high correlations with history of 
music. 


THE PREDICTIVE VALUE OF VARIOUS 
Test BATTERIES 


In order to compare the students’ perform- 
ance on each music test battery with marks 
in college courses, Table V was compiled. 
This table merely rearranges the data already 
presented in Tables I to IV inclusive. 


The Seashore Tests——As a group, the Sea- 
shore Tests indicate highest relationship with 
dictation, and lowest relationship with history 
of music. Sense of Intensity ranks highest 
for dictation and sight singing; Sense of 
Rhythm ranks highest for harmony; Sense of 
Time ranks highest (the r is very low, how- 
ever) for history of music. The best test for 
all four criteria is Sense of Intensity ; the 
poorest test is Sense of Pitch. It is evident 
that none of the coefficients is high enough 
to warrant the expectation that this battery 
can predict success in any of these subjects, 
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TABLE V 


COEFFICIENTS OF CORRELATION BETWEEN SCORES IN FOUR BATTERIES OF MUSIC 
Tests WITH MARKS IN COLLEGE COURSES 


Measures 

Seashore Sense of Pitch 

Seashore Sense of Intensity 
Seashore Sense of Time 

Seashore Sense of Rhythm 
Seashore Sense of Consonance 
Seashore Tonal Memory 
Kwalwasser Melodic Sensitivity 
Kwalwasser Harmonic Sensitivity 
K-D Pitch Discrimination 

K-D Intensity Discrimination 

K-D Time Discrimination 

K-D Quality Discrimination 

K-D Tonal Movement 

K-D Tonal Memory 

K-D Melodic Taste 

K-D Rhythm Discrimination 

K-D Pitch Imagery 

K-D Rhythm Imagery 

Background Discrimination of Mode 
Background Rhythmic Discrimination 
Background Recognition of Themes 


Dict. S.Sing. Harm. Hist. 
.121* .076* —.063* 
. 831 .021* .048* 
.171* —.057* .128* 
142° .2369 . 095* 
.053* .101* —. 055* 
. 233 . 157* —.066* 
ee He 
.045* .056* .238* 
.184* .021* —.076* 
.143* —.075* .199* 
.076* .007* .277 
.109* —.008* —. 126* 
. 064* .004* —.024* 
. 286 020° .213° 
. 065* .033* .254 
. 166* —.040* —.061* 
.3848 —.008* —.014* 
. . 089* .017* —. 416 
. 645 . 507 .1389* .046* 
.387 —.029* —.019* —.067* 
. 047* 


Key: Dict., Dictation; S. Sing., Sight Singing; Harm., Harmony; Hist., History of Music. 
* Coefficients less than four times their probable errors. 


at least as they are presented and evaluated 
in the College of Music of Cincinnati. 


The Kwalwasser Tests—The Kwalwasser 
Tests of Melodic and Harmonic Sensitivity 
yield consistently low coefficients in all cal- 
culations. Harmonic Sensitivity scores cor- 
relate highest with history of music marks, 
with the insignificantly low r of .238. 


The K-—D Tests—As a whole, the K—D 
Tests show negligible relationship with marks 
in college courses, but correlate highest with 
dictation and lowest with harmony. Pitch 
Imagery ranks highest for dictation and sight 
singing; Time Discrimination for history of 
music; none is high enough to be of any value 
for harmony. The best single test for all four 
variables is Tonal Memory; the poorest is 
Pitch Discrimination. 


The measures of musical background.— 
Only two of the four Background Tests were 
correlated with all four criteria. These two, 
Discrimination of Mode and Rhythmic Dis- 
crimination, give the best prediction for dic- 
tation, and the poorest for history of music. 
Discrimination of Mode correlates highest 
with every variable. According to the evi- 
dence of the coefficients computed, the Dis- 
crimination of Mode Test is outstanding in 


predictive power and the Rhythmic Discrimi- 
nation Test possesses enough value to be 
used for the prediction of dictation ability. 


THE PREDICTIVE VALUE OF TESTS FALLING 
IN Various TEST AREAS 


The data discussed in preceding sections 
have been rearranged according to test areas 
for comparison. Table VI presents these 
coefficients of correlation. 


Within each area, some test reveals at least 
a small degree of relationship to marks in 
college courses. The highest coefficient shown 
in the table is that between dictation marks 
and K-D Pitch Imagery scores, yet the scores 
on the other pitch tests have a negligible 
relationship to course marks. Each of the test 
areas possesses more predictive value for dic- 
tation and sight singing than for either har- 
mony or history of music. It must be con- 
cluded from these data, that no particular 
trait area is in itself, and in terms of all tests 
seeming to fall within its boundaries, of out- 
standing value for prognosis of college suc- 
cess. At the same time, at least one test 
within each area does indicate some relation- 
ship, though small, to ability in college 
courses. 
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TABLE VI 


COEFFICIENTS OF CORRELATION CALCULATED BETWEEN MARKS IN COLLEGE COURSES AND 
Scores ON Music TESTS ARRANGED ACCORDING TO TEST AREAS 


Test Areas 
Pitch 


Tests 
Seashore Sense of Pitch 
K-D Pitch Discrimination 
K-D Pitch Imagery 
Seashore Sense of Rhythm 
K-D Rhythm Discrimination 
K-D Rhythm Imagery 
Background Rhythmic Discrimination 
Seashore Sense of Intensity 
K-D Intensity Discrimination 
Seashore Sense of Time 
K-D Time Discrimination 
Seashore Tonal Memory 
K-D Tonal Memory 
Harmonic Seashore Sense of Consonance 
Progression Kwalwasser Harmonic Sensitivity 
Melodic Kwalwasser Melodic Sensitivity 
Progression K-D Melodic Taste 


Rhythm 


Intensity 
Time 


Memory 


Harm. Hist. 
.067* —. 063* 
.021* —. 076* 
.3848 —.008* —.014* 
. 142* . 269 . 095* 
. 166* —.040* —.061* 
.089* .017* —. 416 
.029* —.019* —. 067* 
. 331 .021* —.048* 
. 148* —. 075* . 199* 
.171* —.057* . 128* 
.076* .007* 317 
. 233 .157* —. 066* 
. 286 .020* .218* 
. 053* .101* —. 055* 
. 056* , 238* 
108” . 063* 
.088* .254 


Dict. S. Sing. 
Pf yg 


. 184* 


* Coefficients less than four times their probable errors. 


THE PREDICTIVE VALUE OF GENERAL 
INTELLIGENCE TEST SCORES 


In Table VII, the data are presented re- 
sulting from the correlation of course marks 
with intelligence test scores. 


TABLE VII 


COEFFICIENTS CALCULATED BETWEEN MARKS IN 
COLLEGE COURSES AND SCORES ON THE 
DETROIT INTELLIGENCE TEST 


PE N 


.039 128 
.048 131 


Courses r 


Dictation . 585 
Sight Singing 

Harmony i .054 130 
History of Music .021* .008 97 


* Coefficient less than four times its probable 
error. 


Many studies have been made of the rela- 
tionship between musical ability and intelli- 
gence. Although the public, in general, seems 
to be of the opinion that the two do not 
occur together, a substantial relationship actu- 
ally has been found.’® Mursell states in this 
connection that 

“when functional criteria of musicality are 

employed, musical ability is found to be 

positively associated with intelligence.”*° 


It is interesting to note, then, that marks in 
dictation correlate .585 with scores in intelli- 
gence, and that marks in sight singing cor- 
relate .430 with scores in intelligence. The 

tak 2 . Mursell, The Psychology of ee p. 337. 


. Norton and Company, 1937 
2 Ibid., p. iso" 


r’s computed between intelligence test scores 
and marks in these two subjects, which prob- 
ably may be regarded as significant criteria 
of musicality, substantiate the claim that: 


The possession of musical capacities 
above the average carries with it a high 
degree of general intelligence. It is a for- 
tunate provision of nature that those she 
endows with musical powers shall also 
possess sufficient intelligence to develop 
these powers to the utmost point of per- 
fection.”* 


It is expected that the relationship between 
harmony marks and intelligence test scores 
should be more than slight, since harmony 
has been classified earlier as a musical- 
intellectual subject. It is true that many in- 
dividuals write harmony by rule as a purely 
intellectual process, yet the coefficient cal- 
culated between harmony marks and intelli- 
gence is .299, indicating that relationship is 
present, but in slight degree. That history of 
music marks and intelligence test scores 
should yield a correlation coefficient of .o21 
hardly seems consistent with the fact that 
history of music has been described as an 
intellectual subject. 

From the data of the present study, it is 
evident that general intelligence is of pre- 
dictive value for only two of the four courses, 
dictation and sight singing. The slight or 
negligible relationship of intelligence to har- 


21 Max Shon. The Psychology of Music, p. 165. New 
York: Ronald Press Company, 1940. 
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mony and history of music probably is due 
to the type of mark distribution, which appar- 
ently is not representative of the abilities 
measured. 


SUMMARY 


This section has reported the data on 
twenty music tests and an intelligence test in 
terms of the criterion of success in the four 
college subjects of dictation, sight singing, 
harmony, and history of music. The findings 
are limited in significance by the fact that the 
distributions of marks for two subjects, har- 
mony and history of music, are unusually 
narrow, and possibly do not represent accu- 
rately the abilities of students in these fields. 
The conclusions reached may be listed as 
follows: 


1. Correlation of the tests with each of the 
four college courses: 


a. Success in dictation is best predicted by 
three tests: Background Discrimination 
of Mode (.645), K—D Pitch Imagery 
(.593), and K-D Tonal Memory 
(.445); seven tests indicate slight pre- 
dictive value for dictation; the r’s of the 
remainder are less than four times their 
probable errors and therefore are 
negligible. 

. Success in sight singing is best predicted 
by one test only, Background Discrim- 
ination of Mode (.507); four other tests 
show slight predictive value; the re- 
mainder, because of their large P. E.’s, 
indicate little or no predictive value. 

. Success in harmony (as marks are dis- 
tributed in this study) can not be pre- 
dicted by the music tests, since all r’s 
except one (Seashore Sense of Rhythm) 
are less than four times their P. E.’s. 

. Success in history of music, as in har- 
mony, cannot be predicted by any of the 
tests on the basis of the present data. 


2. The predictive value of the four bat- 
teries of music tests: 

a. The Seashore Tests correlate highest 

with dictation marks, and lowest with 


history of music marks, but no coefii- 


cient is higher than .30. 

. The Kwalwasser Tests indicate little 
relationship with all criteria. 

. The K-D Tests correlate highest with 
dictation marks and lowest with har- 
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mony marks, at least two of the coeffi- 
cients being well within the marked 
relationship group. 

. The Background Tests correlate highest 
with dictation marks and lowest with 
history of music marks, one of the co- 
efficients being the highest of the whole 
battery of tests. 


. The predictive value of trait areas: 


. No particular trait area is in itself, and 
in terms of all tests seeming to fall 
within its boundaries, of outstanding 
value for prognosis of college success. 

. At least one test within each area does 
indicate some relationship, though small, 
to ability in college courses. 


. The predictive value of individual tests: 

. The Background Discrimination of 
Mode Test offers the best prediction of 
success in dictation and sight singing. 

. The Seashore Sense of Rhythm Test 
offers the best prediction of success in 
harmony (slight relationship). 

. The K-D Time Discrimination Test 
offers the best prediction of success in 
history of music (slight relationship). 

. The K-D Pitch Imagery and K-D 
Tonal Memory Tests give high predic- 
tion of success in dictation, slight pre- 
diction of success in sight singing. 


. The predictive value of the intelligence 


. General intelligence (as here measured) 
indicates substantial predictive value 
for success in dictation (.585) and in 
sight singing (.430). 

. General intelligence possesses slight pre- 
dictive value for success in harmony 
(.299 from the present data). 

. General intelligence possesses negligible 
predictive value for success in history of 
music (.021 from the present data). 


On the basis of these data, it is safe to con- 
clude that no single measure should be used 


- to predict success in college; a combination 


of music tests (Background Discrimination of 
Mode, K-D Pitch Imagery and Tonal 
Memory) with an intelligence test (Detroit 
Advanced Intelligence Test) may offer a pre- 
diction sufficiently accurate to be practical. 
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THE VALIDITY OF CERTAIN MEAS- 
URES FOR PROGNOSIS OF 
PROFESSIONAL SUCCESS 


DEFINITION AND DISCUSSION OF THE 
CRITERIA FOR PROFESSIONAL 
SUCCESS 


General statements——Professional success 
is a term used in this study to indicate a 
general degree of attainment in pursuit of 
some vocation in which musical talent* is the 
major essential. The following paragraph in- 
dicates certain assumptions on which the 
work to be described in this section is based. 
These assumptions seem reasonable in the 
light of the literature on musical traits and 
abilities. 

This fundamental musical talent, which 
various psychologists and musicians have 
attempted to measure with some degree of 
success, manifests itself in varying ways in 
different individuals. High attainment as a 
performer is not, of necessity, of greater im- 
portance than is success in college teaching. 
The performer who becomes an eminently 
successful concert artist may lack almost com- 
pletely the special attributes and personal 
characteristics which make an_ excellent 
teacher; yet both may be fine musicians. 
Unless the professional performer with extra- 
ordinary technique seems to possess the fine 
qualities of the progressive student, he is not 
apt to enjoy continued success; hence, he 
needs to be judged on more than a meteoric 
rise to musical prominence. Musicianship is 
more fundamental and comprehensive than 
technique. The concert master of a symphony 
orchestra, eminent in his field, may be a fine 
musician, yet wholly unable to teach a stu- 
dent to play the violin. A successful voice 
teacher at the college level may be a complete 
failure if expected to teach singing in a public 
school, although both positions require un- 
usual talent and attainment. The vocal or 
instrumental teacher in the public schoois 
who is judged successful in that work may be 
exceptionally competent as a musician, yet 
may lack the qualities necessary to handle a 
position requiring supervisory or administra- 
tive duties. Some musicians in both the per- 
forming and the teaching fields are content 
with their level of attainment, and have no 
ambition toward further study. The student 
who is pursuing or who has pursued very 


* This term is used in a general sense. It seems likely that 
the plural form, ‘‘musical talents,’ is more accurate. 
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advanced study, while possessing superior 
musical excellence of the theoretical type 
necessary for the attainment of high academic 
honors, may, nevertheless, lack the attributes 
needed for either teaching or professional 
performance. It is evident, then, that profes- 
sional success in music may take several defi- 
nite directions; yet, in each division of the 
professional music field, there is a place at 
the top only for those musicians who have an 
abundance of what is known as musical 
talent, a term which indicates fundamental 
musical excellence of the kind that has been 
treated elsewhere in this study. 

Criteria for five levels of professional 
attainment.—On the assumption that profes- 
sional success probably is distributed in linear 
fashion, as the traits that constitute musical 
talent seem to be, five categories or levels of 
success are defined in this section, for the 
classification of former College of Music stu- 
dents. Students to be classified were those 
whose school records formed the basis of the 
investigation in prognosis of college success. 
The criteria were set up with the assistance 
of various members of the faculty of the Col- 
lege of Music of Cincinnati (including the 
director of the College, director of studies, 
and a member of the department of school 
music), and of Teachers College, University 
of Cincinnati (including a specialist in edu- 
cational measurement and a psychologist with 
special training in the psychology of music). 

Since the criteria embrace all the types of 
musical endeavor discussed earlier in this 
section, it is not expected that any one indi- 
vidual will meet all of the requirements 
within a given category. The various require- 
ments may be regarded as alternatives, con- 
sidered approximately equal in significance. 


Group I 
a. Teacher in college 
1. An expert in the teaching of 
applied or theoretical music 
2. An expert in the teacher training 
field (position gained by a succes- 
sion of advancements) 
b. Director of music in city school system 
(supervising staff of music teachers) 
c. Performer 
1. Concert or operatic artist 
2. Eminent conductor 
* 3. Member of a ranking symphony 
: orchestra 
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ognized organizations 
e. Musical director of a large radio station 
f. Advanced student 
1. Winner of scholarship for study 
abroad 
2. Holder of advanced degree 


Group II 
a. Teacher in elementary or secondary 
school 
1. Quality of work resulting in re- 
. peated reengagement 
2. Quality of work resulting in pro- 
motion to larger school system 
b. Private teacher of applied music, pro- 
ducing very advanced pupils 
c. Performer 
1. Radio staff artist 
2. Outstanding member of a “name” 
band 
3. Organist and director of music in 
a large church 
d. Possessor of bachelor’s degree plus some 
graduate study 


Group III 

a. Teacher in elementary or secondary 
school (mediocre quality of work result- 
ing in no promotion) 

b. Private teacher of applied music pro- 
ducing pupils of average attainment 
(the pupil possessing unusual ability 
will have transferred to another teacher) 

c. Performers 

1. Player or singer on “spot” radio 
programs 

2. Person filling occasional profes- 
sional engagements 

3. Paid soloist in average church 
choir 

d. Possessor of bachelor’s 
sporadic graduate study 


Group IV 

a. Teacher in elementary or secondary 
school with just sufficient success to 
remain in teaching through frequent 
transfers 

b. Private teacher of applied music unable 
to establish and maintain a regular class 
of pupils 

c. Performer failing to hold a steady 
engagement 


degree plus 





d. Composer of works performed by rec- 
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d. Students with some success in college 
work in music and continuing study or 
practice of music as an avocation, but 
not with the objective of a career 


Group V 

a. Teacher or performer changing to an- 
other profession after failing completely 
in musical pursuits 

b. Student changing to another occupation 
before failing in musical pursuits (evi- 
dence that the probability of failure was 
recognized ) 

c. Student failing completely in college 
work in music 


PROCEDURE FOR COMPILING THE DATA 


Procedure used in tracing former students. 
—Having defined the five levels of success 
for the classification of former College of 
Music students, it was necessary to locate as 
many as possible of the students who had par- 
ticipated in the testing program from 1930 
to 1935. In many instances the investigator 
was able to locate former students through 
the records in the College of Music business 
office, or through personal contacts. In other 
cases, the records were incomplete. 

In October, 1939, a letter was mailed to all 
students entering the College of Music from 
1930 to 1935 who had participated in the 
testing program, and for whom the records 
were incomplete. The letter was sent in each 
case to the latest available address of the 
student. This letter stated the intention of 
the College of Music to ascertain the present 
status of its former students. An enclosed 
questionnaire requested information regard- 
ing marital status, honors received since leav- 
ing the College of Music of Cincinnati 
(diploma or degree), and occupation (teach- 
ing privately, teaching in the public schools, 
teaching in college, playing or singing pro- 
fessionally, pursuing advanced study, or pur- 
suing another occupation). 

Classification into groups—The informa- 
tion secured through the questionnaire was 
merely a starting point for the making of 
contacts with persons qualified to judge the 
level of professional success of the former 
College of Music students. 

No decision as to classification was based 
solely upon information obtained from the 
person classified. In fact, the individuals 
under consideration were totally ignorant of 

















“~ 


“wll lh! 








September, 1941] 


this procedure. In all instances, more than 
one person competent to judge was consulted. 
Authorities questioned included administra- 
tive officers, professional colleagues, employ- 
ers, and other individuals who were qualified 
to pass opinion, or who possessed authorita- 
tive information. In the case of a teacher, 
the investigator sought the judgment of the 
principal or principals of schools in which the 
teacher worked, of room teachers who 
observed his professional ability, and, if pos- 
sible, of a colleague. Obviously, such care- 
ful investigation of each individual was im- 
possible. Therefore, when a case seemed 
dubious and when only uncertain and frag- 
mentary information was available, no assign- 
ment to one group or another was made. 
Because of the nature of the information 
divulged, all sources, of course, must be 
regarded as confidential. 


In the course of this procedure, a total of 
one hundred and fifty individuals were con- 
sidered. Of these, ninety-three were finally 
classified in one or the other of the various 
groups. In each of these cases, sufficient data 
were available to justify a definite assignment 
to one of the groups. From three to five 
judges participated in the decisions basic to 
this classification. 


Typical cases and their classification—A 
detailed description of one or two cases may 
clarify the type of group assignment made. 
Case A, a young woman, possessor of an extra- 
ordinarily beautiful mezzo-soprano voice and 
an unusual pianistic ability (a rare combina- 
tion), became, almost immediately upon her 
graduation, a member of the musical staff of 
an average size radio station. After a very 
successful period (tenure in a radio station 
is notoriously brief and insecure), she was 
asked to become a teacher of piano and voice 
in a small northern college. A steady increase 
of interest in her department led to a higher 
salary. During her teaching period, she con- 
tinued to sing and play professionally, giving 
recitals, taking minor roles in a well-known 
opera company, and singing leading roles in 
local operatic productions. In addition, she 
did not neglect further study in academic and 
applied music fields. As a result of her accom- 
plishments, she was asked to become a 
teacher of piano and voice in a large state 
teachers college. -Refusing a higher salary 
from her administrator, she decided to accept, 
and began another period of successful work. 
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Now, although she has married, she has 
acquired another degree, is teaching, giving 
recitals, and is gaining recognition in the field 
of choral conducting. This young woman was 
assigned to Group I. 

Case B, also a young woman, after leaving 
the College of Music, began to teach before 
receiving a university degree. Her first posi- 
tion was in a village school where she taught 
vocal music. Although apparently she was 
liked very well personally (it was her home 
community), there was general dissatisfaction 
with her work; her classes did not sing well, 
even in performance, and she displayed com- 
plete inability to play the piano accompani- 
ments for them. Even at commencement her 
music was badly prepared. A _ continued 
account of her work would tell something of 
the same story. She secured a position in a 
county system, moving from one school to 
another at the end of each year. Apparently, 
her personal attributes, alone, have kept her 
teaching, not her musical aptitude. It is 
highly possible that she might have been suc- 
cessful in the teaching of dramatic art for 
which she originally prepared herself. She 
was placed in Group IV. 


STATISTICAL TREATMENT OF THE DATA 


General purpose.—The general purpose of 
the procedure was to discover the relation- 
ship between classification by professional 
success in one or the other of the five groups, 
and the degree of attainment in the various 
tests and measures considered as having 
possible prognostic value. 

Technique.—An appropriate technique for 
handling group comparisons of this kind is 
the coefficient of mean square contingency, 
C, developed by Pearson.*? This statistical 
device may be used when it is desirable to 
calculate the relationship between two char- 
acteristics that are unordered, or two charac- 
teristics, only one of which is ordered.2* The 
term “ordered” is used to signify that indi- 
vidual cases can be identified by position on 
a numerical scale. The term “unordered”’ sig- 
nifies that classification of cases is in terms 
of independent categories identified apart 
from any possible quantitative values. The 
contingency technique, as used in this study, 
involves turning ordered series into grouping 

22 Henry E. Garrett, Statistics in Psychology and Education, 
p. 387. New York: Longmans, Green and Co., 1937. 

2 Karl J. Holzinger, Statistical nag y4 Students in 
Education, p. 273. Boston: Ginn and Co., 
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by csiegories. For purposes of prediction of 
success, the procedure that would seem most 
natural, in the light of psychologists’ reports 
on individual differences, is to divide students 
taking a given prognostic test into a limited 
number of groups based on their scores, one 
group being that from which the least pro- 
fessional success is expected, one that from 
which the greatest success is expected, and 
others those including intermediate levels of 
success. From the standpoint of the func- 
tional utility of prognostic tests, these groups 
should be divided from each other at points 
so chosen as to yield the most accurate pre- 
diction possible (ideally, there should exist a 
one-to-one correspondence between classifica- 
tion by the test and classification in terms of 
the criterion). For purposes of statistical 
computation of C the divisions may be made 
anywhere. 

As the students were classified for profes- 
sional success, Group I contained eleven grad- 
uates; Group II, twenty-five; Group III, 
thirty-five; Group IV, twelve; and Group V, 
ten. The investigator assumed that the abil- 
ities measured by the tests were distributed 
in a similar fashion. Therefore, the test dis- 
tributions were subdivided arbitrarily in such 
a manner that the number of cases in the 
categories corresponded approximately with 
those in the distribution for professional 
attainment. 

To test the results of a change in the points 
of subdivision, the ranges of the scores on 
each test were divided into five equal parts, 
and coefficients of contingency were calculated 
on this basis. It was found that some coeffi- 
cients were higher, some were lower, and a 
few were approximately the same, in compar- 
ison with the coefficients computed on the 
basis indicated in the preceding paragraph. 
Were the test results to be used for guidance 
purposes, the scores should be classified in 
such a way as to yield the most accurate pre- 
diction possible. The evidence shows that the 
points of subdivision of the distribution of 
scores do affect the size of C. 

The following interpretation, however, is 
based upon data obtained on the basis of the 
arbitrary classification first described, with 
the number of cases in the various groups 
corresponding approximately to the number 
of cases in the various categories for profes- 
sional success. For purposes of discovering 
and analyzing general relationships, this pro- 


[Vol. 10, No. r 


cedure seems likely to be as significant as any 
other that would employ the contingency 
coefficient. 


PREDICTIVE VALUE OF SCORES ON Music 
TESTS FOR PROFESSIONAL SUCCESS 


Table IX lists according to batteries the 
coefficients of contingency calculated for rat- 
ings on professional success and scores on 
music tests. Each battery will be discussed 


TABLE IX 


COEFFICIENTS OF CONTINGENCY CALCULATED 
FOR RATINGS ON PROFESSIONAL SUCCESS 
AND SCORES ON Music TESTS 


Tests 

Seashore Sense of Pitch. __________.._.__- ‘ 
Seashore Sense of Intensity___._..._..---- g 
Seashore Sense of Time_______..._.-_---- ‘ 
Seashore Sense of Rhythm__-___._..-..---- ‘ 
Seashore Sense of Consonance______...-._. : 
Seashore Tonal Memory- --_---....-.-.---- ‘ 
Kwalwasser Melodic Sensitivity__......._- ; 
Kwalwasser Harmonic Sensitivity _____--_- ‘ 
K-D Pitch Discrimination___._........--- : 
K-D Intensity Discrimination__-.__._..-_- ; 
K-D Time Discrimination_-__.........---- ‘ 
K-D Quality Discrimination_-___...._._.-- ; 
K-D Tonal Movement_____.........--.-- r 
ES Cee ere , 
K-D Melodic Taste_-____ wtidicratele teatalliiesinanaia . 
K-D Rhythm Discrimination -__......---- ; 
Sol, {eee eee ? 
K-D Rhythm Imagery____..._...-.-.---- ‘ 
Background Discrimination of Mode__----- q 
Background Rhythmic Discrimination _- -. ; 
Background Discrimination of Mood___-_--- ‘ 
Background Recognition of Themes-__------ . 206 


separately and summary statements will be 
deduced. No probable errors or corrections 
have been computed, since 


“for 5 x 5 fold or finer classifications, this 
correction is so small that for practical pur- 
poses it may be disregarded.’”** 


The highest possible C is .894 for the five- 
fold classification, according to Yule’s table 
as quoted by Holzinger.*® 

The Seashore Tests.— The coefficients 
range from .341 to .473, a relatively small 
range, indicating that no test will give pre- 
diction of professional success either to a very 
high degree or to a very low degree, yet some 
amount of prediction does exist. Sense of 
Consonance and Sense of Time rank highest, 
with Tonal Memory and Sense of Rhythm 
so close that practically no differentiation is 


at Garrett, op. 3: Le 391. 
* Karl J. Holzinger, op. cit 277. 
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evident. It is interesting that Sense of Con- 
sonance, which is known to be the least reli- 
able test of the six, and which has been re- 
placed in the 1939 revision by a test of 
timbre, gives the highest coefficient of the 
group. It would seem, then, that the arrange- 
ment of coefficients from high to low is purely 
one of chance, and that another order easily 
might be obtained. 

The Seashore tests measure sensory capac- 
ities, the possession of which is regarded as 
essential to the really great musician. How- 
ever, the four elements of musicality which 
Seashore describes as the tonal, the dynamic, 
the temporal, and the qualitative®® rarely 
occur in equal amounts in any one individual ; 
rather, one element usually tends to dominate 
another, thus leading different persons into 
the various fields of musical endeavor referred 
to earlier in this section. Certainly, the indi- 
vidual who possesses a defective sense of pitch 
could never successfully play a stringed in- 
strument, but might win a name for himself 
as a pianist. An individual lacking the tem- 
poral element might never become a good 
conductor, but might have the ability to com- 
pose music for another to conduct. 

It is evident that success in music as a 
profession requires much more than the 
possession of a few sensory capacities. As 
Seashore himself puts it, 


“for high rating in music, numerous other 
factors must be considered, such as re- 
sources, conflicting interests, the will to 
achieve, and especially the power of appli- 
cation, and of hard and continuous work.””** 


The Kwalwasser Tests and the Kwalwasser— 
Dykema Tests —The range of the coefficients 
of contingency for these tests is somewhat 
larger than that for the Seashore tests, the 
lowest being .206, for Kwalwasser Harmonic 
Sensitivity, and the highest being .619, for 
K-D Pitch Imagery. With the exclusion of 
the latter coefficient, the upper limit of the 
range is .471, substantially the same as the 
upper limit for the Seashore tests. The fact 
that the K-D Pitch Imagery test indicates a 
somewhat marked relationship to professional 
success is not altogether unexpected, since the 
same test revealed relatively high relationship 
to college success. Seashore’s statement is 
pertinent: 


% Carl E. Seashore, Psyckolo, Music, 4. New York: 
McGraw-Hit Book <Bogk Company, 19 % 338 ss yc: 
P. 
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Success or failure in music depends upon 
the capacity for living in a tonal world 
through productive and reproductive imag- 
ination . . . His (the musician’s) memory 
and imagination are rich and strong in 
power of concrete, faithful and vivid tonal 
imagery; this imagery is so fully at his 
command that he can build the most com- 
plex musical structures and hear and feel 
all the effects of every detailed element 
before he has written down a note or 
sounded it out by voice or instrument. This 
capacity, I should say, is the outstanding 
mark of a musical mind at the representa- 
tion level—the capacity of living in a rep- 
resentative tonal world . Take the 
image from the musical mind and you take 
out its very essence.”* 


The authors of these tests have, like Sea- 
shore, attempted to measure, with a few ex- 
ceptions, elements such as pitch, time, and 
rhythm that in themselves are obviously lim- 
ited indices of musical ability. A test that is 
highly prognostic probably must be of an- 
other type, and at best an extremely high co- 
efficient hardly can be hoped for because of 
the factors other than musical ability that are 
involved in professional success. 

The Measures of Musical Background.— 
The contingency coefficients of the four tests 
of this group are spread from .206 to .525. 
It appears that a freshman’s knowledge of 
musical literature as measured by the Recog- 
nition of Themes test has little relationship 
to his subsequent professional success, 
although it is to be expected that the accom- 
plished musician will be able to identify the 
well-known compositions of the masters. 
Apparently, the musically talented and the 
musically untalented do not differ greatly in 
their knowledge of musical literature when 
they enter as freshmen. On the other hand, 
the Discrimination of Mode test must require 
an ability that the musically superior persons 
are more likely to possess, the coefficient cal- 
culated being .525, indicating higher predic- 
tive value than any test of any battery except 
K-D Pitch Imagery. This Discrimination of 
Mode test also revealed high relationship to 
college success. This group of tests measures 
perceptual capacities rather than purely sen- 
sory capacities. On the whole, they predict 
professional success about as well as the other 
batteries. It should be remembered that the 

* Carl E. Seashore, op. cit., pp. 5-6. 
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Measures of Musical Background were admin- 
istered in their original form, without benefit 
of the recording, revision, and standardization 
that now have been done.*® It seems reason- 
able to suppose that the revised battery 
possesses prognostic power at least as great 
as that revealed in these data and, in all 
probability, greater. 

Analysis of data classified in terms of test 
areas.—The data discussed in an earlier sec- 
tion have been rearranged in Table X in 
terms of test areas of musical talent rather 
than the arrangement of tests into batteries. 


TABLE X 


COEFFICIENTS OF CONTINGENCY CLASSIFIED 
IN TERMS OF TEST AREAS 


Tests 
Seashore Sense of Pitch__- 
K-D Pitch Discrimination__-__ ‘ 
K-D Pitch Imagery____.___--- ‘ 
Seashore Sense of Rhythm..-_. . 
K-D Rhythm Discrimination __ . 
K-D Rhythm Imagery____.... .44 
Background Rhythmic Dis- 

CC SE . 
K-D Intensity Discrimination . . 
Seashore Sense of Intensity____ . 
Seashore Sense of Time-__-_--_- .4 
K-D Time Discrimination - -_-_- j 
Seashore Tonal Memory. - - --- ‘ 
Memory K-D Tonal Memory-_-____-__-- : 
Harmonie Seashore Sense of Consonance -_ . 
Progression Kwalwasser Harmonic 

7 eee . 206 

Melodic K-D Melodic Taste__________- .471 
Progression Kwalwasser Melodic 
IN i cenitivn t: hnarghivnisninics . 481 


Test Areas 
Pitch 


Rhythm 


Intensity 


Time 


Almost all of the tests within a given area 
indicate similar degrees of relationship to pro- 
fessional success. Exceptions to this are: 
K-D Pitch Imagery, which yields a coefficient 
much higher than the other two pitch tests; 
K-D Time Discrimination, which reveals a 
coefficient considerably lower than Seashore 
Sense of Time; and Kwalwasser Harmonic 
Sensitivity, which shows a coefficient much 
lower than Seashore Sense of Consonance. 
The rhythm tests, melodic progression tests, 
and the memory tests are most consistent. 
There is no evidence that any one trait area 
is valuable, yet within each some trait does 
show predictive value. 

Summary.—tThe coefficients of contingency 
between ratings on professional success and 


2° Corwin H. Taylor, “The Construction and Validation of 
Certain Experimental Measures of Musical Potentiality’’. 
+ es Doctor’s dissertation, University of Cincinnati, 
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scores on music tests reveal that each of the 
batteries of tests tried out will predict pro- 
fessional success to a certain degree. All bat- 
teries evidence some amount of prediction, 
though insufficient to warrant any accurate 
guidance on the basis of test scores alone. 
Two tests only, K-D Pitch Imagery and 
Background Discrimination of Mode, indicate 
a more than moderate amount of prediction 
for professional success. The various trait 
areas in which the tests fall seem to be about 
equally significant in relationship to profes- 
sional success. 


THE PREDICTIVE VALUE OF COLLEGE COURSE 
MARKS, INTELLIGENCE TEST SCORES, 
AND PROFESSORS’ ESTIMATES FOR 
PROFESSIONAL SUCCESS 


Introduction.—The primary concern of this 
investigation is with the predictive power of 
the various music tests. However, it is also 
of interest to discover the predictive signifi- 
cance of certain other data: the marks earned 
in certain college courses, intelligence test 
scores, and professors’ estimates. It may be 
noted that, from this point of view, marks in 
college courses, which were treated in an 
earlier section as criteria of success, are here 
treated as the basis of prediction, and are 
evaluated against the criterion of professional 
success. 

In order that the distributions of marks in 
the four college subjects may be compared, 
Figure 3 has been constructed. Since only 
part of the students whose marks were em- 
ployed as criteria of college success were as- 
signed to attainment levels, there are fewer 
cases involved in the calculation of the con- 
tingency coefficients than of the correlation 
coefficients. It is evident that the general 
spread of the marks in the first instance is 
similar to that in the second (compare Figures 
1 and 2 with Figure 3). 

Table XI presents the coefficients of con- 
tingency between ratings on professional suc- 
cess and college course marks, intelligence 
test scores, and professors’ estimates. 

The college subjects —Of the four college 
subjects, success in sight singing shows the 
greatest amount of relationship to profes- 
sional success, with a coefficient of .582, while 
success in dictation follows with a coefficient 
of .538. In the previous discussion of these 
two subjects, it was emphasized that ability 
in “ear training,” as these subjects sometimes 
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Fig. 3. Distribution of Grades in Four Courses, for Subjects Whose Grades 
Were Correlated with Ratings on Professional Success. 


are called, generally is recognized as a reliable 
index of musicality. It is to be expected, then, 
that marks in these subjects will show at 
least a moderate amount of predictive value 
for professional success. While there are some 
branches of the musical profession in which 
an individual may achieve success without 
having his powers of auditory imagery devel- 
oped to a high degree, the person who does 
possess this latter ability is likely to be found 
at the top of the ladder of success. 

The subjects, harmony and history of 
music, yield coefficients of .341 and .349 re- 
spectively. On the basis of size of coefficient, 
the predictive value of these marks for pro- 
fessional success is less than that of marks in 
dictation and sight singing. Although the un- 
usual form of the distributions probably 
affects the size of the coefficient, the arbitrary 


TABLE XI 


COEFFICIENTS OF CONTINGENCY CALCULATED 
FOR RATINGS ON PROFESSIONAL SUCCESS 
AND COLLEGE COURSE MARKS, IN- 
TELLIGENCE TEST SCORES, AND 
PROFESSORS’ ESTIMATES 


Courses 
Eee eee eee en eee eso ‘ 
iil ii Ae ELE NE RR TG ee . 582 
Harmony ‘ 
REE ee renen ee P 
NF TS Eee el / 
Professors’ Estimate of Talent____..._- ‘ 
Professors’ Estimate of Application. _.._._- . 54 
Professor’s Estimate of Achievement_---__-_. a 


grouping of the marks for the calculations 
may have minimized this effect. 

It is clear that success in college subjects 
has a moderate amount of predictive power 
for subsequent professional success. Those 
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subjects evidencing the largest amount of 
relationship are sight singing and dictation. 

Intelligence —The importance of intelli- 
gence to the truly musical person has been 
discussed earlier in this study and will not be 
elaborated here. Since tiie relationship of in- 
telligence to college success is relatively high, 
it is to be expected that intelligence test scores 
will show not a little predictive value for pro- 
fessional success. This expectation is borne 
out by the coefficient of .501. Seashore says 
in this connection: 

It is a common observation that a per- 
son’s status in life is determined in large 
part by the degree and kind of intelligence 

. The musical profession makes as high 
a demand upon the intelligence as any other 
profession. Rating on intelligence as a sup- 
plement to measurement of musical talent 
is one of the best indices for the prediction 
of success in musical education or a musical 
career.®° 


Professors’ estimates—Three sets of esti- 
mates by professors were correlated with 
ratings of professional success by the con- 
tingency method. The coefficients are .501 
for estimates of talent, .518 for estimates of 
achievement, and .541 for estimates of appli- 
cation. These estimates appear to be about 
as valuable for the prediction of professional 
success as are marks in the two college sub- 
jects of sight singing and dictation, and scores 
on the Detroit Advanced Intelligence Test. 


COEFFICIENTS OF CONTINGENCY FOR THE 
CoMPLETE BATTERY 


Table XII shows the complete list of co- 
efficients of contingency from the largest to 
the smallest. The coefficients range from .201 
to .619, Background Recognition of Themes 
being at the lower extremity and K—D Pitch 
Imagery at the upper. This table is included 
merely as a matter of interest and needs no 
elaborate discussion. In the complete list, two 
college subjects are notably high and two 
low; professors’ estimates and intelligence are 
high; the music tests are scattered through- 
out the entire range. 


SUMMARY AND CONCLUSIONS 


This section has been devoted to a discus- 
sion of the validity of certain measures for 
prognosis of success in music as a profession. 


Carl E. Seashore, ‘“‘Musical Intelligence,” Music Edu- 
cators Journal, XXIV (March, 1938), 33. 
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TABLE XII 


COEFFICIENTS OF CONTINGENCY FOR 
THE COMPLETE BATTERY 
Measures 
OES LE ATES : 


Sight Singin fe eee ee eee ayer é 
Professo re’ Estimate of Application. ______- . 


Dictation. SEA ae ‘ 
Background Discrimination of Mode___-__-_- j 
Professors’ Estimate of Achievement__--__- ; 
Professors’ Estimate of Talent___________- ‘ 
TEASE See ; 
Kwalwasser Melodic Sensitivity___._______ ; 
Seashore Sense of Consonance____________- , 
Seashore Sense of Time__________.___-_-- : 
a TALE : 
Seashore Tonal Memory. -_---..--..--.---- : 
K-D Intensity Discrimination___________-_- : 
K-D Rhythm Discrimination 

Seashore Sense of Rhythm 

K-D Rhythm Imagery ‘ 
Background Discrimination of Mood__-____- ' 
K-D Tonal Movement_-__.___....._..._-- é 
oe 6 eee P 
Seashore Sense of Intensity__.._...._...._- : 
Background Rhythmic Discrimination _-_ _- : 
K-D Pitch Discrimination_______.________- .3 
K-D Quality Discrimination____....__..-- é 
ES aa eee u 
Sani Re ee a ae ‘ 
rs say Semee of Pree... ............ ahora 
K-D Time Discrimination_______.._.__--- ‘ 
Kwalwasser Harmonic Sensitivity_____---- : 
Background Recognition of Themes-_------- ‘ 


The College of Music of Cincinnati entered 
into this investigation as a follow-up of the 
elaborate testing program described in a pre- 
ceding section. 


Procedure —The study in the prognosis of 
professional success in music was organized 
and carried out according to the following 
— 

A general definitive description of the 
attribut es that constitute professional success 
was developed. It was assumed that musical 
talent takes varying forms in different indi- 
viduals, and that a musician may be judged 
successful in any one of many branches of 
musical endeavor. 

2. Criteria for five levels of professional 
success were set up. In brief, the require- 
ments were as follows: Group I, teacher in 
college, director of music in a city school sys- 
tem, recognized composer, performer of note, 
musical director of radio station, or advanced 
student; Group II, school music teacher earn- 
ing promotion or tenure, private teacher pro- 
ducing advanced pupils, performer (such as 
radio staff artist), or holder of degree, plus 
some graduate work; Group III, school music 
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teacher earning no promotion, average private 


teacher, performer earning occasional engage- - 


ments, or holder of bachelor’s degree with 
possible sporadic graduate study; Group IV, 
teacher or performer unable to hold a steady 
engagement, but successful enough to stay in 
music as a profession, or student continuing 
work in music as an avocation; and Group V, 
teacher or performer changing to another pro- 
fession after failure in musical pursuits, or 
student failing in college work in music. 

3. Information was gathered, through per- 
sonal contacts and through questionnaires, 
concerning the professional status in 1939 of 
those former College of Music students who 
had participated in the testing program from 
1930-1935. 

4. Using the criteria developed, ninety- 
three of these students were classified as to 
degree of professional attainment. This clas- 
sification was made by a consensus of from 
three to five judges, on the basis of informa- 
tion furnished by qualified persons (such as 
administrative officers, employers, and pro- 
fessional colleagues). In no case was the 
judgment made from information furnished 
by the person to be classified. 

5. In order to compare the students’ levels 
of professional success with their performance 
on various measures while they were students 
in the College of Music, coefficients of mean 
square contingency were calculated. 

Results——The results may be summarized 
as follows: 


1. The predictive value of the music tests: 

a. All of the batteries of tests will predict 
professional success to a degree, but not 
sufficiently to warrant accurate guidance 
on the basis of test scores alone. 

. Two tests, K-—D Pitch Imagery and 
Background Discrimination of Mode, 
show substantial predictive power for 
professional success, the coefficients 
being .619 and .525 respectively. 

. No one trait area is of more value than 
another for prediction of professional 
success. 


2. The predictive value of college course 
marks, intelligence test scores, and professors’ 
estimates for professional success: 


a. Success in sight singing and success in 
dictation indicate substantial predictive 
power for professional success, the co- 
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efficients being .582 and .538 respec- 
tively. 

. Success in harmony and success in his- 
tory of music evidence much less pre- 
dictive value than success in sight sing- 
ing and success in dictation, the coeffi- 
cients for the former being .341 and .349 
respectively. 

. Intelligence test scores show marked re- 
lationship to professional success, the 
coefficient being .5or. 

. Professors’ estimates of talent, achieve- 
ment, and application evidence about as 
much predictive value as marks in the 
courses, sight singing and dictation, and 
scores on the Detroit Intelligence Test, 
the coefficients for the estimates being 
.501, .518, and .541 respectively. 


3. Conclusion —tThe best prediction of suc- 
cess in music as a profession is probably given 
by a combination of two music tests (K—D 
Pitch Imagery and Background Discrimina- 
tion of Mode), two college courses (dictation 
and sight singing), an intelligence test (De- 
troit Advanced Intelligence Test), and pro- 
fessors’ estimates of ability (Talent, Achieve- 
ment, and Application). 


MINOR PROBLEMS IN THE 
RELATIONSHIP OF VARIOUS 
TRAIT MEASURES 


THE RELATIONSHIP OF INTELLIGENCE 
to Music Test SCORES 


Table XIII presents the coefficients of 
correlation computed between intelligence 
test scores and music test scores. These r’s 
are arranged according to music test batteries. 

The Seashore tests have had perhaps more 
than their share of attention; at least they 
have formed a part of many more investiga- 
tions than have the Kwalwasser or K—D tests. 
While coefficients found in the past range 
from .or to .58 (Mursell’s summary table) ,** 
the majority of the r’s are less than four times 
their probable errors, and are therefore of 
negligible value. 

The findings of the present investigation 
are consistent with these earlier studies. Only 
one of the six coefficients for the Seashore 
battery (Table XIII) is as large as four 
times its P.E. This is the coefficient for 
Sense of Consonance (r, —.271), which is 


Si James L. Mursell, The Psychology of a p. 337. 
New York: W. W. Norton and Company, 1937 
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considered the least reliable test of the group. 


These low coefficients substantiate Mursell’s - 


conclusions that 


“on the whole, the results seem amply to 
justify the frequently repeated assertion 
that performance on the Seashore tests is 
not significantly related to general intelli- 
gence and is not affected, within wide lim- 
its, by the intelligence of the subjects.’’** 


The general picture of the coefficients in 
Table XIII for the K-D and Kwalwasser 
jtest scores is similar to that for the Seashore 
tests. Only two tests, K-D Tonal Memory 
(.378) and K—D Tonal Movement (.368), 
evidence a slight relationship to intelligence. 
The others indicate a negligible relationship. 
It will be recalled that K-D Tonal Memory 
bore a substantial relationship to ability in 
the college subject of dictation. The rela- 
tively high correlation between K—D Tonal 
Memory and intelligence thus bears out the 
relationship of intelligence to dictation ability. 


Two of the four Measures of Musical Back- 
ground, Discrimination of Mood and Rhyth- 
mic Discrimination, evidence a slight relation- 
ship to intelligence, the coefficients being .295 
and .256 respectively. The first of these 
measures employs a technique little used in 
the field of music testing, although this type 
of discrimination has become an increasingly 
important phase of school music activity. 
The two remaining tests of this group show a 
negligible relationship to intelligence. 


In summary, it seems evident that only 
five of the twenty-two music tests that were 
correlated with the intelligence test show even 
a slight relationship to intelligence, the others 
indicating negligible relationship. The con- 
clusion to be drawn is that there is little or 
no relationship between intelligence and musi- 
cal talent, as measured by the existing so- 
called tests of musical talent. Various explan- 
ations may be suggested for the small size of 
the coefficients here reported; e.g., unreli- 
ability of the tests or lack of actual relation- 
ship between intelligence and the tests. No 
claim is made that intelligence and musicality 
are related only to the degree suggested by 
these data. The evidence is presented merely 
for what it may be worth, in comparison with 
other evidence. 

% Ibid., p. 336. 


[Vol. 10, No.1 


TABLE XIII 


COEFFICIENTS OF CORRELATION BETWEEN 
INTELLIGENCE TEST SCORES AND 
Music Test SCORES 


Tests 


Seashore Sense of Pitch. _._—. ones 
Seashore Sense of Intensity .103* 
Seashore Sense of Time _.. .097 
Seashore Sense of Rhythm 
Seashore Sense of Con- 


Seashore Tonal Memory --. 
Kwalwasser Melodic 

Sensitivity 
Kwalwasser Harmonic 

Sensitivi . 
K-D Pitch Discrimination.—. 074* 
K-D Intensity Discrimina- 


K-D Time Discrimination .—. 047* 
K-D Quality Discrimina- 
K-D Tonal Movement..__ . 036* 
K-D Tonal Memory 

K-D Melodic Taste 

K-D Rhythm Discrimina- 


K-D Pitch Imagery--_--__- 
K-D Rhythm Imagery-__- 
Bac und Discrimination 


Backgro a Recognition 
emes 


* Coefficients less than four times their probable 
errors. 


THE RELATIONSHIP OF INTELLIGENCE TEST 
Scores TO Proressors’ EsTIMATES 
or ABILITY 


In Table XIV, coefficients of correlation 
between intelligence test scores and profes- 
sors’ estimates of ability are presented. 
According to these data, the estimates of 
achievement and of application show a slight 
relationship to intelligence, while the other 
estimates indicate a negligible relationship. 
It has been stated earlier that musical ability 
is positively associated with intelligence when 
true musical criteria are used. Investigators 
have employed teacher ratings as criteria of 
musical ability with some success. Mursell 
states his opinion of the value of teachers’ 
estimates in the following words: 


“Since teacher ratings on musical aptitude 
furnish a very obvious validation criterion 
for music tests, and in fact are often used 
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as determiners of talent levels by various 
research workers, it is very desirable to 
form some opinion as to their correct- 
ness.””** 


Of his own investigation in which certain 
pupils were rated by as many as five teachers, 
he says that: 


“In general the extent of agreement was 
strikingly high, the correlations being over 
.g0. This would seem to be evidence for 
objectivity, and to a less degree for validity, 
though it is by no means conclusive.”** 


TABLE XIV 


COEFFICIENTS OF CORRELATION BETWEEN 
INTELLIGENCE TEST SCORES AND 
PROFESSORS’ ESTIMATES 


ra: 8 

.060 126 
-057 126 
-056 126 
060 126 
-058 126 
-062 126 
060 126 


* Coefficients less than four times their probable 
errors. 


Professors’ Estimate 


Talent 
Application 
Achievement 
Rhythmic Action 
Tone Quality 
Technique 


Although some other studies have met with 
less favorable results, Mursell contends that, 


“on the whole, we may tentatively conclude 
that teacher-ratings of musicality furnish 
an important and tolerably diagnostic 
sign.” 


If teachers’ estimates are considered reli- 
able indices of ability, and if ability shows 
substantial relationship to intelligence, it. is 
to be expected that the coefficients calculated 
for this study will be substantial in size. The 
fact that only two of seven estimates are sig- 
nificant in size (more than four times their 
probable errors) seems to indicate that the 
estimates themselves may be unreliable, that 
perhaps some teachers do not thoroughly 
understand the traits to be estimated, and 
therefore are not able to judge them accu- 
rately. 


THE CORRELATION OF CERTAIN MEASURES 
THAT APPEAR TO BE RELATED 


Tests purporting to measure the same trait. 
—Certain tests that, according to their titles, 
purport to measure the same trait have been 

ode ost ae op. cit., p. 135. 


% Idid., p. 
% Ibid., pp. siis-16. 
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TABLE XV 


COEFFICIENTS OF CORRELATION BETWEEN 
TESTS PURPORTING TO MEASURE 
THE SAME TRAIT 


Tests > a 


Seashore Sense of Rhythm 

Background Rhythmic 
Discrimination 

K-D Rhythm Discrimina- 
tion 

Seashore Sense of Rhythm 

Seashore Sense of Pitch 

K-D Pitch Imagery 

Background Rhythmic 
Discrimination 

K-D Rhythm Imagery.- - - - 

Seashore Sense of Time 

K-D Time Discrimination - 

Seashore Sense of Con- 
sonance 

Kwalwasser Harmonic 
Sensitivity 

 : +o Discrimina- 


— Rhythmic 
Discrimination 

Seashore Sense of Intensity 

-_" spienied Discrimina- 


. 053 


. 182* 


mf 
.110* 


. 053 
. 054 
. 054 
. 053 


onan. Tonal Memory 
K-D Tonal ej Si 
Seashore Sense of Rhythm 
K-D Rhythm Imagery. - -- 
Seashore Sense of Pitch 
K-D Pitch Discrimination ._—.025* 
K-D Rhythm Discrimina- 

tion 
K-D Rhythm Imagery... .—.013* 
K-D Pitch Discrimination 
K-D Pitch Imager 
Kwalwasser Melodic 

Sensitivity 
K-D Melodic Taste .053 159 


* Coefficients less than four times their probable 
errors. 


. 054 
. 053 


correlated and the data compiled in Table 
XV. The low coefficients obtained can indi- 
cate only one thing, namely, that regardless 
of their titles the tests cannot be measuring 
the same trait. However, this may be due in 
part to a difference in approach and tech- 
nique. Whitley, in a similar investigation of 
the Seashore and K—D Tests, discovered ex- 
tremely low correlations, and suggested low 
reliability of the tests as a possible reason.** 


Other measures that appear to be related. 
—Table XVI presents the data from the cor- 
relation of measures that appear to be closely 
related. The highest r computed during this 


%M. L. Whitley, “A Comparison of the Seashore and 
Kwalwasser-Dykema Tests,” Teachers College Record, 
XXXIII (May, 1932), 731-51. 
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TABLE XVI 


COEFFICIENTS OF CORRELATION BETWEEN 
RELATED MEASURES 


Measures r PE. N 


Professors’ Estimate of 
Talent 

Professors’ Estimate of 
Achievement 

Professors’ Estimate of 
Feeling 

Background Discrimination 
of Mood 

Professors’ Estimate of 
Rhythmic Action 

Seashore Sense of Rhythm- 

Professors’ Estimate of 
Rhythmic Action 

K-D Rhythm Discrimina- 
tion 


. 335 


Rhythmic Action 
Background Rhythmic 


Discrimination .056 148 


* Coefficients less than four times their probable 
errors. 


study is that of Professors’ Estimate of Talent 
with Professors’ Estimate of Achievement 
(.69). This may mean that the actual accom- 
plishment of the students in applied music 
tends to be in proportion to their talent. On 
the other hand, the relatively high r may be 
due to the fact that many teachers are unable 
to distinguish sharply between talent and 
achievement. It was felt that a comparison 
between Professors’ Estimate of Feeling and 
Background Discrimination of Mood would 
be an interesting one; the resulting r is .400, 
indicating some degree of _ relationship, 
although not a highly marked one. Professors’ 
Estimate of Rhythmic Action and scores on 
the Seashore Sense of Rhythm test correlated 
335, evidencing but slight relationship. The 
two remaining coefficients are so low as to be 
not at all significant. 


INTERCORRELATION OF MARKS IN 
COLLEGE COURSES 


Table XVII presents the data from the 
intercorrelation of marks in college courses. 
Coefficients for these subjects are listed as 
follows in order of size: dictation and sight 
singing (.569), dictation and history of music 
(.365), dictation and harmony (.255), har- 
mony and history of music (.239), sight sing- 
ing and harmony (.141), and sight singing 
and history of music (.058). Since dictation 
has been used in many of the tests described 
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TABLE XVII 


INTERCORRELATIONS BETWEEN MARKS 
IN COLLEGE COURSES 


Dic- Sight Har- 
tation Singing mony 
eee . 569 . 255 
. 569 .141* 
Harmony.... .255 ‘ ‘int 
History .058* .239 


* Coefficients less than four times their probable 
errors. 


Dictation. -_- 
Sight Singing 


in a preceding section as the indirect measure 
of sight singing, a relatively high correlation 
between marks in the two subjects (taught 
by the same teacher) is to be expected. The 
other correlations in this table are low. It 
seems probable that their size is affected by 
the low variability of the mark distributions 
in harmony and history of music. 


THE RELATIONSHIP OF PROFESSORS’ 
ESTIMATES TO MARKS IN 
CoLLEGE COURSES 


In Table XVIII are the coefficients of 
correlation between Professors’ Estimates of 
students’ ability and marks in college courses. 
It should be noted that the estimates here 
referred to were made by teachers of applied 
music, in no case by the instructors in the 
four college courses. As a group, the Pro- 
fessors’ Estimates show the highest relation- 
ship to dictation, second to sight singing, 
third to history of music, and fourth to har- 
mony. The Estimate of Rhythmic Action 
ranks higher than any other, being very little 
higher for sight singing than for dictation. 
Two others, Estimate of Talent and Estimate 


TABLE XVIII 
COEFFICIENTS OF CORRELATION BETWEEN 


PROFESSORS’ ESTIMATES AND MARKS 
IN COLLEGE COURSES 


P. 
Rhythmic 
Action 
. 453 
.463 


= 
Ach. 
. 387 
. 389 


=. 
Appl. 
.413 
. 829 


| A 
Talent 
.418 
. 396 


Dictation___- 
Sight Singing 
Harmony.... .166* .169* .197* 
History . 256 . 852 . 402 


Key: P. Talent, Professors’ Estimate of Talent; 
P. Appl., Professors’ Estimate of Application; P. 
Ach., Professors’ Estimate of Achievement; P. 
Rhythmic Action, Professors’ Estimate of Rhyth- 
mic Action. 


* Coefficients less than four times their probable 
errors. 
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of Application, show a marked relationship to 
dictation. The Estimate of Achievement cor- 
relates highest with history of music marks, 
although the r’s for dictation and sight sing- 
ing are nearly as large. Five measures of the 
entire group of fourteen appear to indicate a 
marked relationship to ability in college 
courses. These figures seem to indicate that 
the teacher is able to make significant predic- 
tions concerning his students’ achievement in 
at least two of the criteria; namely, dicta- 
tion and sight singing. 

Regarding the role of the teacher in deter- 
mining musical ability, Pratt says: 

One can reasonably suppose that only 
the most incompetent teachers would be 
unable to do as well as the tests in esti- 
mating whether a pupil was good enough 
in his perception of pitch, intensity, 
rhythm, etc., to profit by private lessons. 


And the better teachers, since they could 
estimate these traits in five minutes and 
would then proceed to find out whether the 
pupil had any genuine musical ability, 
would do far more than the tests in their 
present state can possibly accomplish.* 


SUMMARY 


This section has dealt with minor problems 
in the relationship of various trait measures. 
The findings may be summarized as follows: 

1. The coefficients of correlation computed 
between intelligence test scores and music 
test scores substantiate the findings of other 
investigators, to the effect that there is little 
relationship between intelligence and musical 
talent as measured by the existing tests of 
musical talent. 

2. Estimates by professors on two traits, 
achievement and application, show a slight 
relationship to intelligence. Estimates on five 
other traits indicate a negligible relationship. 

3. Certain tests that purport to measure 
the same trait yield such low r’s, when scores 
are correlated, that it appears they are not 
actually measuring the same trait. Other 
apparenily related measures, however, indi- 
cate some degree of relationship. These are: 
Professors’ Estimate of Talent and Estimate 
of Achievement (.690), Professors’ Estimate 
of Feeling and Background Discrimination of 
Mood (.400), and Professors’ Estimate of 
Rhythmic Action and Seashore Sense of 
Rhythm (.335, denoting a slight relationship). 


87 Carroll C. Pratt, The Meaning of =, p. 147. New 
York: McGraw-Hill Book Company, 1931 
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4. When the intercorrelations of marks in 
four college courses in music are computed, 
the highest coefficient is that for dictation and 
sight singing (.569); the others show a slight 
or negligible relationship, probably because 
of the narrow spread of the marks in harmony 
and history of music. 


5. Professors’ estimates show a marked re- 
lationship to grades in dictation and sight 
singing, a slight relationship to marks in his- 
tory of music, and a negligible relationship 
to marks in harmony. 


GENERAL CONCLUSIONS 


It is evident that, as a whole, the music test 
batteries do not evidence sufficient predictive 
power to be used by themselves for guidance 
purposes, yet neither do they have so little 
value as to warrant discarding them entirely. 
Three tests are outstanding in comparison 
with the others in the batteries employed. 
These three tests are Background Discrimina- 
tion of Mode, K—D Pitch Imagery, and K-D 
Tonal Memory. While the two last-named 
tests are familiar and have been employed in 
many investigations, the first was completely 
untried at the time the data were gathered 
for this study. The Background Tests have 
since been revised and standardized. The 
fact that the Background Discrimination of 
Mode Test evidenced high predictive power 
for both college and professional success 
suggests that more experimental tests involv- 
ing such perceptual capacities as this test 
measures might well be developed. 


Although musical talent is, of course, the 
primary requirement for college success in 
music and for professional success in music, 
the individual who possesses that musicality 
will reach a limit above which he cannot rise 
unless his musical ability is accompanied by 
no small degree of intelligence. The evidence 
of the correlation coefficients and of the con- 
tingency coefficients calculated in the present 
study supports this statement. 

The student who is successful in his college 
work, in particular in the courses, dictation 
and sight singing, is the one who has the 
greatest chance to succeed professionally. 

Much has been said concerning the value 
of teachers’ estimates of ability. The evidence 
of the present investigation tends to support 
the conclusions of others, that professors’ 
judgments of a student’s ability are fairly 
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reliable indices of his subsequent success as a 
professional musician. 

The evidence gathered in this study tends 
to support an analytical theory of the nature 
of musical talent and achievement, rather 
than the single trait theory that musical talent 
is one unified trait. 

In conclusion, it may be said that the 
present study has evaluated as to prognostic 
worth the most widely used and carefully 
standardized of available tests, together with 
certain new and promising tests, a total of 
twenty-two music tests in all; has compared 
the prognostic values of these tests with those 
of certain other measures, including an intel- 
ligence test, ratings by instructors on various 
traits, and marks in college courses; has util- 
ized as criteria both success in college courses 
in music and success in music as a profession; 
has covered a time span of five years of test- 
ing and four additional years of elapsed time 
during which students might have demon- 
strated their professional success; has in- 
volved sufficient cases (185 as a total) to 
yield significant results; and has analyzed 
relationships in terms both of individual tests 
and of test batteries. 

Earlier studies in this area of investigation 
have failed to bring conviction that their re- 
sults are scientifically sound and conclusive. 
The present study marks the end of a period 
of controversy over the validity of musical 
talent tests as they exist at present. The data 
presented are such as to make this in certain 
respects a definitive investigation. 
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A COMPARISON OF THE PERFORMANCE OF FRESHMEN AND 
SOPHOMORES IN A BEGINNING COURSE IN PSYCHOLOGY* 


M. Bruce FISHER 
Fresno State College, California 


The history of the teaching of psychology 
as a separate subject does not go back much 
more than a half century. It entered the col- 
lege scene, in this country at least, as some- 
thing one took at the end of the philosophy 
curriculum, and it was usually called some- 
thing like mental or moral philosophy. When 
separate departments of psychology were set 
up, the courses were first open only to 
juniors and seniors. As the curriculum ex- 
panded, the elementary course was moved 
down to the sophomore year, at which point 
it has been very nearly fixed for the last 
twenty-five years. A relatively small number 
of institutions offer it to freshmen and an 
even smaller number of secondary schools 
teach psychology. 

One is led to wonder why this halt at the 
sophomore level should have occurred. It 
might be that certain work in zoological sci- 
ence has been considered prerequisite to the 
study of psychology, although we rarely find 
such a statement in college catalogues. Some 
have claimed that more than freshman 
maturity is needed, but this position seems no 
more logically taken with respect to psychol- 
ogy than any other subject matter in the 
college curriculum. A more important reason 
than these positions would seem to be that 
the places in the first-year students’ pro- 
grams have been so generally pre-empted by 
required courses in longer-established disci- 
plines: foreign language, mathematics, his- 
tory, English, and the older sciences. 

Whichever of these answers be correct, the 
question still remains whether sophomores 
do any better in elementary psychology than 
freshmen, and if so, why? It is with an 
approach to this problem that the present 
study is concerned. 

In 1937 the home economics curriculum at 
Rhode Island State College was changed, for 
certain administrative reasons, so that a re- 
quired one-semester course in psychology, 
formerly given in the first semester of the 
sophomore year, was moved to the first 

* This study was conducted at Rhode Island State College. 
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semester of the freshman year. During the 
first year of the new arrangement both fresh- 
men and sophomores in the curriculum were 
enrolled in the course. The freshmen, who 
numbered forty-four, were in a section by 
themselves. The thirty-seven sophomore home 
economics students were in a section totalling 
sixty-five, which also included students from 
the sophomore, junior and senior classes of 
other curriculums in the college. All stu- 
dents in the course used the same text; they 
listened to lectures that were approximately 
the same; the same general procedure was 
followed in class meetings, with the same 
demonstrations and experiments. The only 
planned difference between the freshmen and 
the others was that more attention was given 
in the freshmen section to the matter of how 
to study for a college course, and this course 
in particular. All students took the same ob- 
jective quizzes and examinations, including 
the final examination. 


Except for the smallness of the numbers 
involved, there was here presented an oppor- 
tunity to compare the performance of other- 
wise similar groups of first- and second-year 
students in the same situation. 

The first part of the analysis used the 
matched-group technique. Thirty-three pairs 
of home economics students, each comprising 
a freshman and a sophomore, were selected 
from the two classes. They were, of neces- 
sity, matched on sex, and also on age at col- 
lege entrance, and on rank in the American 
Council on Education’s Psychological Exam- 
ination. It is to be noted that this matching 
made the sophomores a year older than the 
freshmen, as well as giving them a year’s 
experience in college. The success of the 
matching on the selection criteria is indicated 
in Table I, which gives the averages and an 
indication of the significance of the differ- 
ences. On age, 26 of the 33 pairs were within 
12 months. The larger differences were all in 
cases where both individuals were in the 
upper part of the age range. The Psycholog- 
ical Examination scores were transmuted into 
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TABLE I 
MATCHING OF GROUPS 


Age in months at college entrance 


Sophomore 
Percentile on A.C. E. Psychological Examination Freshman 

Sophomore 
Raw Score on A.C.E. Psychological Examination Freshman 

Sophomore 


percentile ranks for purposes of matching, 
using the published distributions of all col- 
lege women for 1936 and 1937. In these 
terms the greatest difference in matched 
pairs was five ranks, and the avérage differ- 
ence less than 2 ranks. Twenty-eight of the 
thirty-three pairs were within three percen- 
tiles. The Psychological Examination raw 
scores were derived from the published tables 
of equivalence, using the 1937 scores as a 
base. It will be noted that the differences on 
the Psychological Examination are in favor 
of the freshmen and that the critical ratios 
are small. 

The differences in performance of the two 
groups are presented in Table II. Three 
major objective quizzes were given during the 
first half of the semester. The total of these 
appears here as the midterm score. About the 
middle of the second half of the semester 
another objective examination was given, in- 
dicated here as “3rd Q.”. The final examina- 
tion was objective. The “Total” is the raw 
sum of the other scores listed plus a number 
of other short quizzes that were given dur- 
ing the second half of the semester. The 


Mean 


219.1 
220.6 
55.1 
54.8 
177.1° 
175.5 


44.8 

general nature of the comparison is clear. 
The sophomores did slightly, but consistently, 
better than the freshmen at all points during 
the course, although none of the critical 
ratios is as great as three. 

Another aspect of the comparison is the 
difference between the shapes of the distribu- 
tion curves, as indicated by the measures of 
skewness. The sophomore curve is close to 
normal at each comparison, but the freshmen 
curve shifts from a definitely positive skew 
to normality. This sort of picture leads to 
the conclusion that the less able freshmen 
were expending extra effort, under pressure 
of the necessity to stay in college. There was 
considerable case-study evidence that this was 
true, and that a number were successful in 
improving their performance. Many more 
freshmen than sophomores took advantage of 
the instructor’s office hours for remedial 
work. 

A tendency is apparent here, particularly 
in the series of medians, for the freshmen to 
improve their standing with respect to the 
sophomores, during the course of the semester. 
This is one indication of the factor that ap- 


TABLE II 
COMPARISON OF PERFORMANCE 


Mean 


112.9 
116.5 


317.1 
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pears to be most important in determining 
the difference between the two classes, 
namely, one year’s experience in college work. 
This assumes, of course, that the small num- 
ber of cases did not involve serious sampling 
errors, but rather operated through the sta- 
tistical machinery to make the critical ratios 
smaller than they should have been. There 
is corroborative evidence that such is the case 
in the records of two other classes of fresh- 
men home economics students who have since 
taken the course. These groups have done a 
little less well than sections of sophomores, 
juniors, and seniors from other schools of the 
college who were taking the course at the 
same time, but they improved their relative 
position during the semester. They also had 
similarly skewed distribution curves of marks. 


Another factor besides college experience 
might have been operative in the situation. 
It is true that the sophomores averaged 13.5 
months older at the time they took the course, 
but the available evidence on the maturation 
of mental ability between the ages of 18 and 
20 years would lead us to expect little from 
this influence. 


Some further comparisons are possible 
from the set of correlation coefficients in 
Table III. For their calculation, data from 
all the first-year (N — 44) and second-year 
(N = 37) home economics students were 
used. The coefficients between age and total 
score indicate that whatever advantage age 
might be in the freshman year has probably 
more than disappeared by the sophomore 
year. 


Two elements of consistency are apparent 
from an inspection of the coefficients. In the 
first place, the sophomores show a consistently 
higher correlation between the results of the 
Psychological Examination and performance 
in the course than do the freshmen. In no 
case are the pairs of coefficients significantly 
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different, but the trend is evident. This is a 
result we have not been led to expect. Usually 
the trend of such relationships as this, be- 
tween a scholastic aptitude test and perform- 
ance in college, has been reported as having 
a downward direction as one moves away 
from the freshman year. 


The second consistency that is apparent is 
the improvement during the semester, on the 
part of both groups, in the extent to which 
performance correlates with the scholastic 
aptitude test. The sophomores start higher 
and finish higher, but both improve. Here is 
evidently an indication that the students of 
both classes are becoming adjusted to the 
courre, to the ways of the instructor, and 
perhaps most important, to the objective- 
type examination, which was new to all the 
freshmen and to a large proportion of the 
sophomores. 


Three other pairs of coefficients in the 
table show the normal degree of relationship 
between successive measures of performance 
in college courses. 


By way of summary and conclusions: 


1. The performances of groiips of freshmen 
and sophomores matched on college curric- 
ulum, sex, age at college entrance, and scho- 
lastic aptitude test rank were compared in a 
beginning course in psychology. 

2. The differences between the two groups 
were not statistically significant and de- 
creased during the semester. 


3- Correlation between scholastic aptitude 
and performance is higher for sophomores 
than freshmen and increases durmg the 
semester. 


4. It is suggested that the most important 
factor determining these differences in marks 
is the adjustment of the student to college 
courses and to this course. 


TABLE III 
CORRELATIONS 


Age 
—.06+.10 
—.39 +. 10 


Midterm 
-26+.10 
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If these girls are a fair sample of college 
students, we should always expect a slight 
inferiority on the part of freshmen as com- 
pared with sophomores. But we should also 
expect to be able to overcome this inferiority 
to a considerable extent, by some satisfactory 
orientation procedure and by specific direc- 
tions on how to study. 

It is reasonable to suppose that most 
teachers of psychology would allege their 
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subject matter to have some valuable transfer 
value in the academic activities and personal 
adjustment of the student. If this be one of 
the aims in teaching if, then, when admin- 
istrative arrangements permit, this course 
may as well be offered to freshmen and to 
high-school pupils as to upperclassmen. The 
sooner these students can have the opportu- 
nity to learn the practical things psychology 
has to offer, the longer they will have to profit 
by it. 





AN EXPERIMENTAL STUDY IN REMEDIAL TEACHING 
IN COLLEGE FRESHMAN MATHEMATICS* 


Jack WOLFE 
Brooklyn College, New York 


The problem.—This study was undertaken 
to discover whether a definite program of 
remedial instruction based solely on the pre- 
requisite mathematics would prove more 
effective in improving the students’ perform- 
ance in the trigonometry course than what- 
ever incidental review may be included 
within the course itself. Thus the main in- 
vestigation was concerned with those students 
who were found at the beginning of the term 
to be below average in the prerequisite skills 
that find significant application in the trigo- 
nometry course. 


The importance of this problem is peculiar 
to courses of a sequitur nature in which it is 
generally assumed that the students have ac- 
quired and retained the prerequisite skills. 
At Brooklyn College the trigonometry course 
is taken mainly by lower freshmen, and it 
precedes the course in college algebra. The 
fact that only about twenty per cent of all 
the trigonometry students had taken mathe- 
matics the preceding term appeared indica- 
tive of the desirability of having some of the 
students engage in a definite program of re- 
view work in conjunction with the trigo- 
nometry course. Thus the direct remedial 
program was investigated as an efficient 
means of affording administrative cognizance 
of the “laws of forgetting.” 


Data—The data of this experiment were 
collected from the college records of the stu- 
dents who were the subjects in this study and 
by direct testing with written objective exam- 
inations that lent themselves to aid in diag- 
nosis as well as to measure achievement. In 
addition to the various tests given to the 
remedial group, relatively large general groups 
of students of all levels of ability were tested 
in the mathematics prerequisite to trig- 
onometry.* 
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301 students at the beginning of the trig- 
onometry course; 

280 students at the end of the trigonometry 
course (most of these students had taken 
the same test at the beginning of the 
trigonometry course) ; 

403 students at the end of the college 
algebra course (of these, 146 had taken 
the same test in the trigonometry course, 
and 257 had not taken the test pre- 
viously). 


Many of the data thus acquired went far 
beyond the direct nseds for the main problem 
of this investigation and were utilized in 
certain subproblems that were studied. 

Procedure——The students who scored be- 
low average in the initial test, comprising 
those prerequisite topics that are employed 
significantly in the trigonometry course, were 
separated into two groups, the control group 
and the experimental group, containing 63 
and 66 students, respectively. The trigonom- 
etry instruction of these two groups may be 
considered uniform, since they were prac- 
tically equally represented in each of the 
thirteen trigonometry sections participating in 
the experiment. These two groups were also 
equivalent in the following characteristics at 
the beginning of the term: 


I. score on initial test 
2. chronological age 
3. decile group on psychological placement 
test 
4. lapse of time since completion of last 
preceding mathematics course 


The remedial instruction was administered 
to the experimental group outside of regular 
class hours. Attendance was voluntary but 
was recorded, and letters were written to the 
absentees informing them when to report for 
a “make-up” lesson. All communication with 
the 66 students in the remedial group was 
effected by means of the United States mail 
service; thus the teachers did not know which 
students had scored low on the initial ‘test. 
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After the completion of a series of eight 
general review lessons with the assembled 
experimental group, diagnostic tests were 
given in arithmetic, algebra, and geometry. 
By meeting the students by appointment in 
small groups of four students per hour, the 
investigator then began the individualized 
remedial instruction for each student accord- 
ing to his needs. For each student the reme- 
dial teacher had at hand a folder containing 
his work sheets and a summarizing guide 
sheet indicating his difficulties. 

Individualized instruction—At each hour 
as soon as the first student appeared his guide 
sheet was examined. If the first topic listed 
there for review was, say, fractions in arith- 
metic, he was given a diagnostic test on that 
particular topic. Similarly, each student ap- 
pearing at that hour was given the diagnostic 
test corresponding to the first topic listed for 
review on his guide sheet. The topics in 
which each student took these tests were 
those in which group instruction for him had 
not been successful and in which he there- 
fore required individual diagnosis. 

The individualized instruction proper began 
with the diagnosis based on the student’s test 
work and responses to supplementary ques- 
tions. A few brief, specific, numerical illus- 
trations usually sufficed to destroy the stu- 
dent’s confidence in his wrong procedure. 
Then a method of correct procedure was 
developed, since remedial work usually in- 
volves both reteaching and the undoing of 
many bad habits. Generally the desired 
method was developed inductively from 
numerical illustrations. The instructor was 
concerned with the student’s understanding 
the reason behind the method, even though 
the reasoning was on the level of specific 
numerical cases, rather than with his obe- 
dience in accepting an abstract and general 
deductive proof. The time for explanations 
to individual students ranged generally from 
two to ten ~inutes. If more time was re- 
quired, the unit was resolved into smaller 
parts. 

Upon the satisfactory completion of the 
test or the supplementary problems on a 
given topic, the student continued in like 
manner with his remaining topics. This gen- 
eral procedure was repeated for each student 
in rotation. When a student had completed 
his test while the teacher was occupied with 
another student, he immediately began the 
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test on the next topic indicated on his guide 
sheet. Thus each student was working until 
his turn was reached for the individualized 
instruction. 

In the complete dissertation, copies of 
which are in the library of New York Univer- 
sity, appears a detailed discussion of the 
method of the individualized reteaching for 
each topic in the remedial syllabus, as well as 
an analysis of its effectiveness in accomplish- 
ing its immediate objective. The following 
topics comprised the remedial syllabus: 


Arithmetic: operations with zerd, removing 
parentheses, decimals, fractions, radicals, 
approximating \/n as a decimal. 

Algebra: removing parentheses, solving 
Pythagorean equation, solving linear 
equation, factoring, solution of equation 
by factoring, fractions, exponents, loga- 
rithms. 

Geometry: Pythagorean theorem, 30° right 
triangle and 45° right triangle, similar 
triangles (setting up the proportion and 
solving for one unknown), angle sense. 


The mean average attendance for the com- 
plete series of remedial lessons was 14.4 reci- 
tation hours or thirty per cent as much as the 
time of the regular trigonometry course. 


Results ——The conclusions are as follows: 
1. This study corroborated the findings of 
many other investigations: 


a. that many students entered their current 
courses inadequately prepared in the 
background skills. 

. that the incidental treatment of the pre- 
liminary topics did not reduce these 
deficiencies sufficiently, for a significant 
number of students, for successful per- 
formance of the work of the current 
course. 

. that a remedial program served to im- 
prove the students’ work in the current 
course and to reduce the failing rate. 


2. The follow-up phase of the investigation 
beyond the semester of the remedial assistance 
was considered significant in estimating the 
lasting power developed by the remedial pro- 
gram. In the subsequent course, college 
algebra, the passing rate of the remedial group 
was higher than that of the control group and 
even of the total group. The next higher 
mathematics course, analytic geometry, was 
not required, and twelve per cent of the con- 
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trol group elected to take it, while twenty-one 
per cent of the experimental group did so, 
although there was a slightly higher propor- 
tion of science students in the former group. 
Only seven per cent of the control group com- 
pleted the analytic geometry course, however, 
whereas nineteen per cent of the experimental 
group did so. All of these students passed the 
course with no essential difference in the final 
marks for the two groups. The remedial pro- 
gram apparently produced a superior founda- 
tion that outlived the period of direct assist- 
ance. 

3. The remedial program appeared instru- 
mental in raising almost all of the “potential” 
failures in trigonometry to the level of the 
D mark, but in college algebra there was a 
relative scarcity of both F’s and D’s with an 
unusual concentration at the C mark. That 
the results in trigonometry were not more 
striking might have been due to the fact that 
the remedial instruction was not completed 
until almost the end of the trigonometry 
course. Table I indicates the passing rate for 
each course: 
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4. With reference to the skills in the pre- 
requisite topics, the experimental group and 
the control group were equivalent at the be- 
ginning of the trigonometry course. After the 
remedial aid, however, the experimental group 
performed better in the prerequisite topics 
than not only the control group but also the 
originally superior total group. More speci- 
ficaliy, at the end of the trigonometry course 
the remedial group had attained greater skill 
in the prerequisite topics than even the stu- 
dents of the total group who received “A” as 
their final mark, while the corresponding per- 
formance of the control group was the same 
as that of the “D” students. At the end of 
the college algebra course, during which term 
no remedial assistance was given, the experi- 
mental group had again showed its superiority 
in these prerequisite skills over the students 
who received the final mark of A in college 
algebra, while the corresponding performance 
of the control group was between the levels 
of the D and the C students. Whatever weak- 
nesses in trigonometry or college algebra may 
have existed in the experimental group after 


TABLE I 
SUCCESS IN PASSING TRIGONOMETRY AND COLLEGE ALGEBRA 


Course 


Trigonometry 
College algebra 
Both courses in first attempt 


The differences in the passing rates in 
trigonometry and in college algebra were in 
favor of the remedial group, when compared 
with the control group, the critical ratios 
being 2.5 and 1.1, respectively. Although 
these differences are not statistically certain, 
they become significant as corroborative evi- 
dence when viewed along with other observa- 
tions. The difference in passing rates for 
college algebra is more significant than it 
appears numerically to be, for the control 
group had already been freed of its worst 
students by failure in trigonometry, whereas 
there was virtually no elimination -in the 
remedial group. By conducting the remedial 
program with the weaker students for part of 
one semester, a passing rate of ninety-four per 
cent was achieved for the year of freshman 
mathematics, whereas the control group 
showed a corresponding passing rate of 
seventy-seven per cent. 


Control Experimental Total 


Group 
(per cent ) 
98 91 
96 92 
94 84 


roup 
(per cent) 


the remedial program could not be attributed 
to persistent deficiencies in background skills. 


Each group showed significant improvement 
in the prerequisite skills at the end of the 
trigonometry course. The college algebra 
course enabled the students to maintain their 
level of those skills but did not tend to 
raise it. 


The superiority of the mean scores of the 
experimental group over those of the control 
group for the second and third test is statis- 
tically certain, the critical ratios being 12.6 
and 6.8, respectively. 

5. In general, of two groups, the one higher 
on the test on the prerequisite topics at any 
of the three testings also earned higher final 
marks in that course, and conversely. This 
relationship did not appear for individual 
students, however, but was observed for 
groups selected and ranked by either basis. 
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TABLE II 
COMPARATIVE GROWTH WITH REGARD TO SCORE ON TEST ON PREREQUISITE SKILLS 


Time of Test 


Beginning of trigonometry course 
End of trigonometry course 
End of college algebra course 


6. The test in prerequisite skills, admin- 
istered at the beginning of the term of trig- 
onometry, was found to have underestimated 
the abilities of the students with a greater 
lapse of time since their last preceding mathe- 
matics course and to have overestimated rel- 
atively the abilities of the “most recently 
prepared” group. This observation was an- 
other instance in verification of the lack of 
retention of subject matter. It served also to 
support the hypothesis that a single test pro- 
duces a static portrayal of the student’s func- 
tional knowledge and does not indicate his 
latent knowledge that would become func- 
tional again upon review and refreshment. 

7. Of the students who were initially weak 
in the prerequisite mathematical skills, those 
who had scored above average on the Thur- 
stone Psychological Examination had no 
better passing rate in trigonometry than those 
whose score on the psychological examination 
was below average. Thus a higher general in- 
telligence was not sufficient to compensate for 
specific weaknesses in the prerequisite mathe- 
matical skills. 


Summary.—The lack of retention of sub- 
ject matter is an indisputable phenomenon 
that must be recognized in any program of 
education that is to be not merely consistent 
within itself and “logically” sound, but also 
applicable in the world in which we live. The 
assumption that the students had at one time 
actually mastered the prerequisite topics and 
have retained their skills may lead to a well 
knit system of education on paper, but the 
failure of this assumption in practice may be 
a chief factor contributing to some of our 
educational ills, especially those of student- 
failure and its consequent problems. This 
assumption, that the students actually have 
the prerequisite skills because they have 

the prerequisite courses at some time 
during their high-school period, is so glaringly 
false that any system of education depending 
strongly upon the truth of this assumption 


Mean Score of: 
Control Experimental 
Group 

(per cent) 


Total 
Group 
(per cent) 
63 


90 79 
89 77 


Group 
(per cent) 
50 


cannot be highly efficacious in accomplishing 
its objectives. 

In support of the hypothesis that practical 
disregard of the actual knowledges, skills, and 
abilities of our students as they comé to us 
is a source of diminished efficiency of our 
teaching, we find that the subject most 
strongly of a sequitur nature, mathematics, 
has the greatest failing rate of all subjects in 
elementary school,? in high school* and in 
college,* according to various reports. Like- 
wise, the second ranking subject of a sequitur 
nature, foreign languages, has the second 
highest failing rate in high school® and in 
college.*® 

Undoubtedly there remain inherent diffi- 
culties of the subject itself, even for ade- 
quately prepared students, but regard for the 
students’ welfare demands a constructive 
treatment of the problem of background 
deficiencies and not a “passing-the-blame” 
dismissal of it. 

As a result ~f conducting with college 
freshmen an experiment in remedial work in 
the skills prerequisite to trigonometry, the 
writer believes that the most efficient way, 
administratively and educationally, for an 
institution to meet the problem of review is 
by the introduction of a definite, specific, and 
individualized remedial program for the 
students who need it. 

From an analysis of the types of errors, it 
is obvious that there is no one concept which, 
if developed in the student, will eliminate all 
errors. The whey are varied and are specific; 
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the remedial work should be designed to re- 
move the specific defects. Mathematical abil- 
ity includes the sum of many specific mathe- 
matical skills. Although interrelationships are 
present, it is not within the capabilities . of 
most of our students to comprehend and 
apply them without specific direction. Specific 
remedial instruction is not to be conceived as 
treating each topic or skill as if it were en- 
tirely independent of all others, but rather as 
a method of indicating explicitly the relevant 
transfer and relationships. 

The remedial program in this investigation 
served to reduce the failing rate to an insig- 
nificant amount, probably because of the 
tremendous reduction in background defi- 
ciencies, but other concomitant factors must 
not be ignored. The improvement in the stu- 
dents’ attitude toward the current work of 
the trigonometry course was an important 
characteristic that aided them to sustain 
themselves in the trigonometry course when 
they were technically ready for it. Further- 
more, the gains of the remedial group must be 
associated at least in part with the additional 
time spent on mathematics. However, after 
observing the background errors of the con- 
trol group that had persisted throughout the 
trigonometry course, the investigator doubts 
that merely more time on trigonometry would 
have been highly effective for the weaker 
students, since it would not have attacked 
the difficulty at its source. Nor is extensive 
review in class educationally desirable or 
efficient. The remedial program may well be 
viewed as an opportunity offered for the 
students who need it, wherein they must con- 
tribute their own time and effort without 
receiving course credit. Thus, the classwork 
will not be disrupted and the quality and 
quantity of the work will not degenerate. 

The wholesome morale between the reme- 
dial students and the remedial teacher gives 
rise to the hypothesis that learning will be 
improved when the role of the teacher is 
teaching and not judging. 

The evidence indicates that the remedial 
program did not “baby” the students so that 
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they became dependent upon the continua- 
tion of such aid in their future work. On the 
contrary, it appeared to have laid a superior 
foundation that outlived the immediate 
period of assistance and enabled the students 
later to sustain themselves independently 
and better than if they had not received the 
earlier review. 

The proper solution of the problem of ade- 
quate review and reteaching depends mainly 
on the administration of the institution or 
the department rather than on the individual 
teachers. “Would it be worthwhile for a uni- 
versity to set aside a brief period for indi- 
vidual diagnosis and remedial work, or is it 
better to begin new work on the false assump- 
tion that the students know the funda- 
mentals?””’ 

The students admitted to the day session 
of Brooklyn College are a scholastically select 
group, as attested by the relatively high 
entrance requirements and by the fact that 
on the Thurstone Psychological Examination 
the median score of the Brooklyn College 
entering freshmen ranked twenty-first in a 
descending order listing of the median scores 
of 323 colleges. Such students, therefore, 
may be assumed to have the scholastic ability 
to profit from a program of remedial instruc- 
tion. It is likely that the favorable results 
observed may not appear to the same extent, 
if the remedial program is administered to a 
group less capable of profiting from the 
assistance. The findings, therefore, must not 
be interpreted as absolute in the light of the 
remedial program alone, but rather as the 
Tesults of a complex set of many factors that 
differ somewhat from time to time and place 
to place. 

In an experiment of this nature, any re- 
sults other than the mere statement of the 
specific observations must be viewed as 
hypotheses supported by the data rather than 
as conclusions proved deductively. 


7™R. Schorling, “‘The Need for Being Definite with R 
to Achievement Standards,” Mathematics Teacher, 
(May, 1931), 320. 





A DIFFERENTIATED PROGRAM FOR DULLER 
HIGH SCHOOL PUPILS* 


Grant D. Morse, Superintendent 
Saugerties Public Schools, New York 


The problem.—tThis study was undertaken 
to determine whether the educational needs 
of the duller high school pupils in certain New 
York State village superintendencies are 
being met more nearly by an adjusted offer- 
ing or by the traditional Regents program. 

Six schools were studied. Three of them 
were Regents schools that provide the same 
offering and maintain substantially the same 
requirements for all pupils. The other three 
attempt to adjust their offering to meet the 
needs of the duller pupils. Some of the adjust- 
ments are: (1) a curriculum constructed to 
help a pupil do whatever he is going to do 
rather than to prepare him for admission to 
college; (2) adjustment of marks to avoid 
discouraging a pupil who is working nearly 
to capacity; (3) the presentation of subject 
matter, which, like the curriculum, is func- 
tional rather than of college entrance type; 
and (4) the offering of a diploma based on 
local teachers’ examinations, not State 
Regents examinations. 

Three criteria were employed to determine 
whether the traditional Regents schools or 
the experimental schools succeed more nearly 
in adjusting their offering to the needs of 
their duller high school pupils: (1) the ex- 
tent to which the pupils continue in-school; 
(2) the degree to which they enjoy the ex- 
perience, coupled with a conviction that the 
experience is socially and culturally valuable; 
and (3) the extent of actual attainment of 
some of the recognized aims of secondary 
education. 

The importance of the problem is evidenced 
by the fact that the holding power of the 
secondary schools in New York State is not 
as high as it might be. 

From the ninth grade on, attendance in 
New York drops off sharply, until, in the 
twelfth grade only about 60 per cent of the 
age group are in school. In this New York 
ranks low among the comparable states.* 


* Submitted in patel ey of the uirements for 
the degree of Doctor of Philosophy the School of Education 
of New York University, 1941. 


+The Regents’ Inquiry, Education for American Life, p. 7. 


38 


The high school program in New York 
State has had a predominantly academic 
flavor. This fact presented a problem for the 
high schools when the public began to demand 
a secondary education for all of its boys and 
girls, “> zardless of their intellectual capacities. 

Most of the duller pupils did not custo- 
marily continue their education through the 
high school grades prior to the period of the 
first World War. A report by the Governor’s 
Committee? shows that, whereas there were 
207,556 pupils in the high schools of the state 
in 1920, by 1932 there were 572,874 pupils, 
an increase of 365,318 or 176 per cent. Prior 
to this great influx of pupils into the sec- 
ondary schools of the state, the assumption 
of nearly everyone, state department officials, 
public school officials, and the public alike, 
was that nearly every pupil who sought grad- 
uation would qualify for a state diploma. It 
is no secret that many of the boys and girls 
who have come into the secondary schools 
since the first World War have not been able 
to qualify for the state diplomas. The state 
compulsory education law nevertheless com- 
pelled their attendance until they reached 
sixteen years of age, and even after reaching 
sixteen most of them could not find gainful 
employment. The policy of the state of New 
York was to continue to distribute state aid 
to the schools largely on the basis of average 
daily attendance. A school received as much 
state aid for a dull pupil as for a bright one. 
However, the state education department did 
not keep pace in its diploma-awarding policy 
with its financial-assistance policy; that is, it 
failed to offer a diploma for which all the 
dull pupils could qualify, though it continued 
to distribute state aid to make possible their 
continued attendance if they so chose. This 
situation left the individual school faced to a 
large extent with the determination of pro- 
motion standards and diploma-awarding cri- 
teria for those pupils who wished to remain 
in school but could not meet the Regents 


level. 


2 Report of the Governor's Committee on the Costs of 
Public Education in the State of New York, 1933, pp. 7-16. 
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Sources and procedure.—Each of the three 
experimental schools in this study was indi- 
vidually matched against a_ traditional 
Regents school on the following eight points: 


1. Amount of money spent per pupil for 
current expenses 
. Geographical proximity to each other 
. Percentage of total school monies raised 
locally 
. Assessed valuation back of each pupil in 
average daily attendance 
. Tax rate per thousand 
. Type of community (suburban or rural) 
. Faculty 
a. Educational training of members 
b. Years of experience 
c. Number of pupils per teacher in 
average daily attendance 


8. Opportunity for employment of youth 


The Henmon-Nelson Test of Mental Abil- 
ity was administered to the seniors and juniors 
in the six high schools. One hundred and four 
non-Regents pupils in the experimental 
schools were then matched individually 
against the same number of pupils in the 
traditional Regents schools on the five fol- 
lowing bases: 


. Mental age 

. Chronological age 

. Sex ; 

. Grade in school 

. Intelligence quotient 


The experimental group and the control 
group were matched individually as far as 
possible, but more perfectly as a group on the 
following four bases: 


1. Resident or nonresident 

2. Parents born in this country or abroad 
3. Foreign language spoken in the home 
4. Occupation of father 


The first criterion of the success of the 
experimental and control schools in meeting 
the problem of adjustment to the needs of 
the duller pupils was measured by determin- 
ing which type of school retained a higher 
percentage of its entering freshmen through 
the junior and senior years. 

The second criterion of the schools’ success, 
that of the extent to which the pupil liked 
his experience of attendance, coupled with a 
conviction that the experience was socially 


PROGRAMS FOR DULLER PUPILS 39 


and culturally valuable, was measured by a 
High School Experience Test, devised by the 
investigator. 

The third criterion of the schools’ success, 
that of enabling the pupils actually to attain 
some of the recognized aims of education, was 
measured by three tests: 

a. The Wrightstone Scale of Civic Beliefs 

b. The What Would You Do? Test 

c. The Myers—Ruch Progress Test 


Results —The findings are as follows: 

1. No appreciable difference was found 
between experimental schools and control 
schools as regards success in retaining their 
pupils. The pupils who left school during the 
senior or junior years, so far as the data re- 
veal, did not leave in greater numbers because 
the school made an adjusted offering or 
failed to do so. 

2. The pupils in the experimental schools 
indicated, through the High School Experi- 
ence Test, that they enjoyed the experience 
of attending, coupled with a conviction that 
the experience was socially and culturally val- 
uable, to a greater degree than the pupils in 
the traditional schools. Two forms of this test 
were employed. The pupils in the experi- 
mental schools scored higher on both forms. 
The D/PEp on Form A of the High School 
Experience Test was 2.28, and on Form B, 
2.4. Both of these critical ratios, however, 
fall short of 4, which is the minimum figure 
to insure significant superiority. 

3. The traditional Regents schools showed 
greater effectiveness on the third criterion, 
actual attainment of some of the recognized 
aims of secondary education (as measured by 
the (a) Wrightstone Scale of Civic Beliefs, 
(b) What Would You Do? Test, and (c) 
Myers—Ruch Progress Test). 

a. The Wrightstone Scale of Civic Beliefs: 
On this test the D/PEp is 3.15 in favor of 
the pupils in the Regents schools, which 
approaches statistical significance. 

b. What Would You Do? Test: The 
D/PEp is 1.44 in favor of the pupils in the 
Regents schools, which is not statistically 
significant. 

c. Myers-Ruch Progress Test: The 
D/PEp is 10.47, a quotient more than 2% 
times enough to be significant. 


Conclusions —The results of the experi- 
ment suggest the following conclusions: 
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1. The adjustment of the educational pro- 
gram to the duller pupils did not affect the 
holding power of the schools. 

2. The pupils for whom the educational 
program was adjusted liked their experience 
better than did the Regents pupils, though the 
difference is not statistically significant. 

3. The pupils in the traditional Regents 
schools scored higher on the two attitudes 
tests (though the differences are not statis- 
tically significant) and did much better on 
the Myers—Ruch Progress Test, which tests 
traditional subject matter, including English, 
social studies, mathematics, and science. 

Recommendations.—Suggestions for 
ther study and for adjustment are: 

1. Further study is recommended to dis- 
cover why the holding power of New York 
State high schools is not higher in the upper 
years; the suggestion is made that, if the 
adjustment to the needs of the duller pupil 
can be more adequately made, this holding 
power may possibly be increased. 

2. Since the pupils in the experimental 
schools liked their experience better, mean- 


fur- 
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while believing it to be socially and cultur- 
ally valuable, in spite of having to overcome 
the handicap of prejudice against an adjusted 
program in a state where the Regents program 
is held almost sacrosanct, further study is 
recommended to render the program still 
better adjusted. 

3. Because the pupils in the Regents 
schools attained a higher achievement in 
those attitudes measured by the two attitudes 
tests, and did much better than the pupils in 
the experimental schools on the subject- 
matter test, these recommendations are made: 

a. That the schools seeking to adjust their 
programs exercise full care to safeguard their 
pupils against receiving a thinned and mean- 
ingless offering; 

b. That further research be devoted’ to 
ascertaining what subject-matter standards 
are truly appropriate for duller pupils. The 
suggestion is made that norms on a test, such 
as.the Myers—Ruch Progress Test, are not 
perfect standards of achievement toward 
which a duller high school pupil should be 
directed. 





AN EXPERIMENTAL STUDY TO DETERMINE THE RELATIVE 
EFFECTIVENESS AT THE SECONDARY LEVEL 
OF TWO METHODS OF INSTRUCTION’ 


Hazet M. HatcHer 
Michigan State College 


In the past, the training of high school 
teachers in colleges and universities focused 
upon mastery of the subject matter to be 
taught. In recent years emphasis has been 
placed on methods of teaching as well as on 
content, because it has been recognized that 
the teacher should not only possess technical 
information but also know how to adapt the 
content of her subject matter to produce the 
optimum development of students in her 
classes. Today there is much controversy 
over methods of teaching. Methods in use 
vary from rigid formality in classroom pro- 
cedure to unrestrained freedom of action; if 
the professional education of teachers is to be 
on a par with the technical training, it is 
important that educational research be util- 
ized to determine what methods best serve 
to produce the desired development of 
students. 


THE PROBLEM 


This investigation is an experimental study 
in home economics, made to determine the 
relative effectiveness at the secondary level 
of two methods of instruction, referred to 
throughout the study as the control and ex- 
perimental methods. The control method was 
wholly directed by the teacher who also de- 
termined the objectives, decided upon the 
content of the unit, planned the procedures 
to be followed, and evaluated the pupil’s 
achievement. In the experimental method, 
the teacher and pupils together determined 
the goals they wished to reach, decided how 
best to work toward these goals, and together 
checked accomplishment as the unit prog- 
ressed. 

The specific steps in the classroom proce- 
dure typifying the two methods are as 
follows: 
sl Te ea are ot aa A 


Control Method 


. The teacher selected objectives and subject 


matter for the unit. These were based on 
pupil needs and the prescribed course of 
study. 


. The teacher presented objectives to the 


pupils and outlined the subject matter to be 
covered in the unit. 


. As soon as the teacher presented objectives 


to the pupils and outlined subject matter to 
be covered in the course, she gave a pre- 
test, explaining that it was for the purpose 
of determining how much the pupils already 
knew about some of the material to be cov- 
ered. She explained that the score made on 
this test would not count toward the final 
mark, but would provide information so that 
she could check on individual improvement. 
The teacher made the scores available to 
the pupils who wished to know their 
achievement. 


. The teacher used whatever method of pre- 


sentation seemed best according to her 
judgment and made assignments when it 
appeared to be desirable. 


. The teacher encouraged maximum pupil 


participation in teacher-planned activities. 


. The teacher checked pupil progress at in- 


tervals by use of check lists provided and 
made her ratings available to those who 
wished to know their achievement. 


. The teacher told the pupils early in the unit 


how much check list ratings and other 
measures, including examinations, would 
count toward the final mark. For example, 
the total score made on check lists, with 
the exception of one for food needs, might 
count two-thirds and other measures, in- 
cluding examinations, might together count 
the other third. 


. The teacher set the date for the final exam- 


ination and explained that it would include 
material covered in class assignments. 
After the examination had been given, the 
teacher made the scores available to the 
pupils in order that they might know their 
achievement and improvement. 


Experimental Method 


. The teacher set up tentative pupil objectives 


based on pupil needs and provided situa- 
tions or used devices to make pupils aware 
of these needs. 
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. The teacher led class discussion in such a 
way as to enable the pupils to state these 
needs in the form of objectives, and then 

ided them to choose activities that would 
elp them attain their objectives. 

. As soon as the pupils set up their objectives 
and activities for the unit the teacher gave 
a pre-test, explaining that it was for the 
purpose of determining how much the pupils 
already knew about the goals that they had 
set. The teacher explained that the score 
made on this test would not count toward 
the final mark, but would provide informa- 
tion whereby the pupils could later check 
their own improvement; and she made test 
scores available to the pupils. 

. The teacher urged the class to keep plans 
flexible while engaging in activities that 
they believed would help them attain their 
objectives. For example, the class might 
commence to work as a whole and later 
might shift to small groups or to individual 
activity, or to both. The teacher and pupils 
together decided on assignments. 

. The teacher encouraged at every opportu- 
nity pupil participation in class activities 
that the group had planned. 

. The teacher provided self-evaluation de- 
vices for the pupils to check their own 
progress at intervals, and she also checked 
progress at the same intervals, using the 
same kind of check lists. She made her 
ratings available to pupils who wished to 
compare their ratings with those of the 
teacher. She provided opportunity for in- 
dividual conferences in which ratings could 
be adjusted to the satisfaction of both 
teacher and learner. 

. The teacher and pupils decided early in the 
unit how much the check list ratings and 
other measures, including examinations, 
would count toward the final mark; they 
decided also whether all ratings should 
count equally. For example, teacher and 
learner might decide that the total score 
made on check lists, with the exception of 
the one for food needs, should count two- 
thirds and that other measures, including 
examinations, should together count the 
other third. 

. The teacher explained to the pupils early in 
the unit that the final examination would 
be in terms of their objectives (see No. 2) 
and would provide opportunity for indi- 
viduals to check their own achievement and 
improvement. After the examination had 
been given, the teacher scored the papers 
and then allowed the pupils to check their 
achievement, by comparing pre-test and 
final scores. 


PROCEDURE 


It was decided early in the study that each 
participating teacher should be given the 
opportunity to decide whether she would use 
the control method, the experimental method, 
or both methods. It was recognized that such 
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freedom of choice would preclude the use of 
the rigorous statistical treatment that would 
have been possible had each teacher been as- 
signed to teach both methods. On the other 
hand, it was believed that the disadvantages 
would be more than compensated for by per- 
mitting teachers to use methods of their own 
choice. 

After the general plan for the research had 
been formulated, conferences were held with 
the supervisors, high school principals, and 
certain of the teachers in St. Paul and Minne- 
apolis to determine how the study could be 
made most helpful to all concerned. It was 
agreed that a twelve-week unit in foods and 
a four-week unit in consumer buying, both 
at the senior high school level, were most 
suitable. Demonstration lessons to show the 
differences between the control and the ex- 
perimental methods were taught and these 
were observed by the teachers who partici- 
pated in the study. Following the demonstra- 
tion lessons, the teachers indicated the assist- 
ance they wished to have to enable them to 
carry on their part of the study with a min- 
imum of error. These demonstration lessons 
were recorded by stenotype and copies were 
furnished to the teachers who participated in 
the experiment, so they would not rely on 
memory of the demonstrations to keep clearly 
in mind the differences in the lessons as 
taught by the two methods. They were also 
furnished with detailed directions regarding 
the procedures to be used in the method or 
methods each proposed to use. 

The investigator in charge of the study 
selected objectives in the areas of foods and 
consumer buying that an earlier investigation 
had indicated were likely to be the objectives 
of almost all pupils at the senior high school 
level. She developed teaching aids and meas- 
uring devices to supplement those already 
available and these materials were furnished 
to the teachers using the control method and 
to both teachers and pupils in the classes to 
be taught by the experimental method. 

The measuring devices used in the foods 
classes, in addition to a pencil-and-paper test, 
were of three types: food score cards for 
measuring the quality of the foods prepared 
by the pupils,? a check list for measuring the 
adequacy of the meals listed by pupils on a 
special record form as having been eaten for 
one week, and a series of six self-teaching and 


2 Published by the University of Minnesota Press, Minne- 
apolis, in 1939. é 
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self-evaluating devices planned for checking 
on the different objectives in the foods unit.* 
The measuring device in the consumer- 
buying classes, in addition to a pencil-and- 
paper test, consisted of diary records kept of 
purchases made during one month. The 
teachers in consumer-buying classes were fur- 
nished also with certain teaching materials, 
because it was thought that they would be 
less familiar with such content than with that 
usually taught in foods classes. 

Additional techniques, such as interviews 
with pupils and their mothers, anecdotal rec- 
ords, and written reports made by teachers 
and learners were used to collect data on the 
less tangible aspects of learning such as in- 
terests, attitudes, and habits, which it was 
assumed might be developed as a result of 
instruction. 

Every possible means was used to elim- 
inate bias in the selection of the classes to be 
used in the experiment. The classes were 
selected at random from among those taught 
by each participating teacher and, if she used 
both methods, the classes to be taught by 
each method were also selected at random. 

There seemed to be no consistent difference 
between the teaching ability of the group of 
teachers using the control method and the 
ability of those using the experimental 
method in terms of the years of experience, 
degrees held, and supervisors’ ratings, but the 
teachers of foods, on the whole, were rated 
somewhat higher than the teachers of con- 
sumer buying. 

All classes were taught daily and, with the 
exception of the classes in one school, all were 


3 Devel by the investigator and published by the Bur- 
gess Co., Minneapolis, in 1940. 
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scheduled for one hour (including the time 
between classes). In this one school, both 
teachers used both methods of instruction. 
The equipment was very similar in the 
different schools. For cooking, small gas 
plates supplemented by large gas stoves were 
used in every laboratory. The arrangement 
of equipment varied in the different schools, 
but if there was any difference with respect 
to the quantity or quality of equipment in the 
control and the experimental classes, it 
seemed to be slightly in favor of the former. 


While the study was in progress, the in- 
vestigator visited each class at least twice to 
observe progress and to answer questions 
about procedures to be employed in the 
teaching. She also interviewed 36 pupils (a 
controlled sampling that represented various 
intelligence and socio-economic levels) and 
the mothers of the same pupils. 

Usable data were collected from goo pupils 
in 35 high school classes in St. Paul and Min- 
neapolis, and it was possible to pair 282 
learners in the foods classes and 276 in the 
consumer-buying classes on the three vari- 
ables showing the highest relationship to 
achievement, namely, IQ, pre-test score, and 
the socio-economic level as determined by the 
father’s occupation. (See Table I). The 
slight differences in grade level and age be- 
tween the groups has no statistical signifi- 
cance, so it seems justifiable to say that the 
groups are matched on grade and age as well 
as upon the three other characteristics used 
as pairing bases. 

Later it was found that the data for the 
entire control and experimental groups might 
have been used, because the unpaired pupils 


* TABLE I 


COMPARISON ON FIVE CHARACTERISTICS OF THE PAIRED PUPILS IN THE 
CONTROL AND IN THE EXPERIMENTAL CLASSES 


Control Group Experimental Group 





Characteristics 


Foops CLASSES (141 pairs) 
Grade level 
Chronological age 
Intelligence quotient 
Occupational status 


CONSUMER-BUYING CLASSES (138 pairs) 
Grade level 
Chronological age 
@ Intelligence quotient 
Occupational status 
Pretest scores 


* Mean 
Pv ape 

10. 41 
193. 94 
104. 55 
4. 33 
52. 48 


11. 58 
207. 87 
107. 90 

4.38 

81.14 
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in the control and in the experimental classes 
were similar to the paired learners with re- 
spect to each of the five characteristics and 
also with respect to relative achievement. 


FINDINGS 


The findings are discussed separately for 
the foods and for the consumer-buying 
classes. 


Foods classes—Table II shows that, in 
terms of scores on the final pencil-and-paper 
test, ratings on meal preparation, and scores 
on food products, the paired pupils in the 
experimental group were superior to those 
with whom they were paired in the control 
group. In each instance the critical ratio was 
greater than three, which considerably ex- 
ceeds the one per cent level of significance 
usually deemed sufficient to indicate that the 
difference between the achievement of the 
two groups was not the result of chance.* 


The difference in the test scores of the two 
groups showed a critical ratio of 5.36; the 
difference in teachers’ ratings on the meals 
(using the devices developed by the inves- 
tigator) showed a critical ratio of 7.62; and 


_ *A critical ratio of 3 indicates that in only 3 cases in 1000 
is there a probability that so large a difference would result 
from chance. 
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the difference in teachers’ ratings on the 
products prepared by the pupils (using the 
food score cards) showed a critical ratio of 
3.66, with the difference in each case in favor 
of the experimental group. 


Further evidence of the superior achieve- 
ment of the pupils in the experimental classes 
is shown in Table III, in which the dietary 
practices of the two groups are compared. 
Since the dietary records were not signed, it 
was not possible to isolate those of the paired 
pupils, so all individuals in the control and 
in the experimental classes were compared. 
It will be seen that the mean rating on the 
dietaries was slightly less after instruction 
than before (.11), in the case of the control 
group. Although the loss was too small to 
have any statistical significance, it seems 
obvious that class instruction did not result 
in improved dietary practices. 

In contrast, the mean dietary rating of the 
experimental classes at the end of the period 
of instruction showed an improvement of 
2.54, a gain with a critical ratio of 6.18; when 
comparing the mean ratings at the end of 
instruction for the control and the experi- 
mental groups, the difference in favor of the 
experimental group had a critical ratio of 
8.30. 


TABLE ITI 


RELATIVE ACHIEVEMENT OF PAIRED PUPILS IN THE Foops CLASSES 
TAUGHT BY THE CONTROL AND THE EXPERIMENTAL METHOD 


Evidence of Achievement Mean 
Pencil-and-paper test scores 


eet on 


Ratings on Meal Preparation 
Control gro 
Experiment 

Scores on food products 


Control grou 
Experiment 


SEus 


6. 54 
7.79 


15. 44 
10.98 


17.79 
19. 16 


SEy oe. Ge 


0. 56 


0. 66 0. 86 


4.61 


1. 30 
0. st 12.13 


or 
1.61 


8.06 2.20 


TABLE III 


RELATIVE IMPROVEMENT IN DIETARY PRACTICES MADE BY THE 
CONTROL AND BY THE EXPERIMENTAL CLASSES 


Dietary Ratings Mean 
Control group 

1. Before instruction 
2. After instruction 


Experimental group 


3. Before instruction 28. 36 
4. After instruction 30. 90 


27. 68 


5.04 
4. 66 


4.94 


SEy Difference 


0.31 M;—M,=—0.11 
0.29 


0. 31 M,—M as 


0. 27 M,— = 


2. 54 
3. 33 
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Other analyses of the food habits of pupils 
showed the consistent superiority of the ex- 
perimental classes, regardless of whether 
comparisons were made in terms of the num- 
ber of pupils who practiced certain bad habits 
or the number of times these practices were 
listed. Certain undesirable practices disap- 
peared completely in the experimental classes, 
such as that of eating only a candy bar for 
lunch. More improvement was evident in the 
case of pupils under 16 years of age than of 
those 17 years or older. 


In view of the following facts, considerable 
faith may be placed in the evidence cited: 
(1) the pencil-and-paper test had a coefficient 
of reliability of .95 for the group of senior 
high school pupils participating in the experi- 
ment; (2) the food score cards had been 
shown to have coefficients of objectivity of 
approximately .9o in an earlier study;* (3) 
the dietary check list had been found to be 
not only so objective that two trained raters 
gave identical scores to the week’s dietaries 
of a group of adolescents, but the ratings on 
the check list correlated very markedly with 
the ratings on the same dietaries when they 
had been analyzed in terms of the number of 
grams of protein and minerals and the num- 
ber of units of the different vitamins sup- 
plied; (4) although specific evidence of the 
objectivity of the evaluating devices used in 
rating the meals was not obtained in this 
study, similar devices used in other investi- 
gations at the University of Minnesota have 
been found to have coefficients of objectivity 
of .9o or higher, when used by trained raters. 


In addition to the comparisons already de- 
scribed for the relative achievement of the 
matched pupils in all of the control and the 
experimental foods classes, certain compari- 
sons were made with individual classes. 
These are described in the following para- 
graphs. 


5Clara M. Brown, A Study of Science Prerequisites and 
Certain Sequent Courses at the University of Minnesota. 
Minneapolis: Committee on Educational Research, University 
of Minnesota, 1941. Pp. 94. 
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The achievement of two ninth-grade and 
two tenth-grade classes taught by the same 
teacher, who used both methods, was com- 
pared. Each class was composed of approxi- 
mately 25 pupils and was taught for 12 weeks 
in the same classroom for the same length of 
period per day. The achievement of the 
pupils in the experimental classes in both the 
ninth and the tenth grades in this school was 
found superior to that of the control classes 
in the corresponding grades. For the classes 
in this particular school it was possible to 
apply the Johnson—Neyman analysis, a more 
rigorous technique than that of determining 
the significance of the difference between the 
matched groups in,terms of the critical ratio. 
The fact that conclusions similar to those 
previously described were found after the 
data had been analyzed by the Johnson— 
Neyman technique would seem to give added 
weight to the findings already reported. 


The dietary practices of two classes, each 
taught by a superior teacher, were compared; 
the pupils in the experimental class showed 
definite improvement after instruction, where- 
as those in the control class had a slightly 
lower rating after instruction than before. 


The final comparison of achievement under 
the two methods of instruction was made by 
contrasting the ratings on food practices of 
two classes, one taught by the teacher who 
was rated as the best of those using the con- 
trol method and the other taught by the 
teacher rated as the poorest of those using the 
experimental method. Even in this compari- 
son, the experimental class made somewhat 
greater improvement than did the control 
class. 


Consumer-buying classes —Table IV shows 
the relative achievement of the control and 
the experimental classes in consumer-buying 
on the pencil-and-paper test, which had a 
reliability of .91. It will be seen that the mean 
achievement of the experimental classes was 
significantly higher than the achievement of 
the control classes, the difference having a 


TABLE IV 


RELATIVE ACHIEVEMENT IN CONSUMER BUYING CLASSES ON PENCIL-AND-PAPER 
TEST BY GROUPS OF PAIRED PUPILS 


Mean 
36. 10 


SEais 
5. 68 


6.12 


Difference 


Lon 


SExy 
0. 48 


0. 52 


SE gin CReg 


- 707 4.20 
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critical ratio of 4.20. In interpreting the sig- 
nificance of this critical ratio, one should keep 
in mind the fact that the unit was relatively 
brief, covering only a four-week period. 


The difference in favor of the pupils in the 
experimental classes was evident in the case 
of those who had not been paired as well as 
in the case of the paired pupils, so it would 
seem that the paired pupils were typical of 
the total population in the consumer-buying 
classes used in the experiment and that the 
experimental method was more effective than 
the control method. 

An analysis of the diaries kept by the 
pupils showed that those in the experimental 
group tended to have greater knowledge 
about the purchases they made, that they 
obtained more real information from clerks, 
and that they showed more evidences of 
learning in specific situations than did pupils 
in the control group. 

The case studies based on interviews with 
pupils and their mothers seemed to indicate 
far greater interest and a much better atti- 
tude on the part of those in the experimental 
classes than in the control classes, in both 
foods and consumer-buying; and their prac- 
tices seemed to be more favorably affected as 
a result of instruction. 


Pupils’ comments indicated a better atti- 
tude toward school work, more interest in 
learning, and the development of greater in- 
itiative, independence, and judgment in the 
case of those in experimental classes than in 
control classes. 

Written comments by the teachers and their 
statements, recorded by stenotype in a con- 
ference following the completion of the ex- 
periment, indicated that the pupils in the 
experimental classes tended to show more 
interest in class work and greater initiative; 
they worked more independently and showed 
better judgment. Moreover, the teachers were 
apparently convinced that even better results 
could be obtained with the experimental 
method after it was more thoroughly under- 
stood and used with greater ease. 


CONCLUSIONS AND RECOMMENDATIONS 


Since the sample used in this study included 
pupils from two-thirds of the senior high 
schools in St. Paul and Minneapolis, and 
represented approximately half of the pupils 
in the foods and consumer-buying classes in 
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the senior high schools in these cities, one 
might expect to have pupils in other Twin 
City senior high school classes show similar 
achievement, if they were subjected to the 
same teaching procedures as those used in 
this investigation. The fact that the schools 
from which the classes were drawn repre- 
sented socio-economic levels ranging from the 
lowest to the highest in the cities would 
strengthen this supposition. 


To what extent the findings are applicable 
to senior high school classes in general can 
only be conjectured. It would seem that the 
evidence presented is sufficiently consistent 
and so definitely in favor of the effectiveness 
of the method of instruction used in the ex- 
perimental classes that teachers, supervisors, 
and others interested in promoting better 
teaching would be encouraged to experiment 
with the newer method. 


Although it was shown that the classes 
taught by the experimental method achieved 
significantly better than those taught by the 
control method in all statistical comparisons, 
and that the poorest teacher using the ex- 
perimental method was able to change the 
food practices of her pupils somewhat more 
than the best teacher using the control 
method, the investigator would not recom- 
mend that all home economics teachers use 
the experimental method, because the effec- 
tiveness of the method probably lies in part 
in a voluntary acceptance of the psychological 
principles upon which the experimental 
method was based. To recommend its use 
arbitrarily would be equivalent to using the 
technique of the control method with teachers 
in much the same way the teachers used it 
with their pupils. 

The investigator believes that this study 
has clarified the newer method to the extent 
that those who desire to use it, or to help 
others learn to use it in the typical classroom 
situation, may be able to do so more effi- 
ciently than would have been possible if the 
study had not been made. As a result of her 
own experience, she would like to suggest 
that student teachers be given opportunities 
to observe both the control and the experi- 
mental methods in operation in the classroom 
in order that they may reach their own con- 
clusions regarding the merits of each method. 
She would suggest also that supervisors help 
teachers who desire to employ the experi- 





. we PF we. SS Ww 


“ 


| ced 


September, 1941] 


mental method to use it so successfully that 
their own enthusiasm and that of their pupils 
will cause other teachers to experiment with 
the method on their own volition. 

The reader should recall that the experi- 
mental method involved the use of the self- 
teaching, self-evaluating devices as well as 
the utilization of cooperative efforts of 
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teacher and pupils in setting up goals and 
determining class procedures. How much of 
the apparent superiority of the experimental 
method, as used in the investigation, depended 
upon the special instructional materials and 
how much upon pupil participation in plan- 
ning class work is an open question, and 
further research will be needed to answer it. 





AN EXPERIMENTAL STUDY OF THE EFFECT ON HIGH 
SCHOOL SOPHOMORES OF TEACHING ENGLISH 
WITH EMPHASIS ON GUIDANCE 


FANNIE MAE CROWE 
Stanford University 


Introduction——The idea of guidance in 
education is possibly as old as education 
itself. It is not new in itself, but has received 
little attention until the present century. The 
school has always aided a few of its pupils 
to find their occupations. This service has 
favored those of station and intellect rather 
than all pupils. Expansion of the field next 
included educational guidance. When recog- 
nition of the guidance function became pro- 
nounced a few years ago, it became a much 
talked about subject. Thus guidance came to 
be another educational term with a variety of 
meanings. 

Guidance differs from the work of the 
classroom teacher in that it also includes the 
analyzing of needs and the giving of individ- 
ual counseling. The contribution of the class- 
room teacher is necessary, but the total re- 
sponsibility of guidance cannot be fulfilled 
through the teaching of subject matter alone. 
Specialized guidance workers are almost help- 
less without the co-operation of the classroom 
teacher, and the teacher may often be seri- 
ously handicapped without the help of some- 
one who can assume the specialized functions 
of guidance. Thus, there is a need for all 
activities of the school to be co-ordinated, if 
the object of guidance are to be realized. 
Little, hoy, has been done concerning the 
integration of guidance with the subject 
matter that is taught in the classroom. 

Purpdse of this study—The purposes of 
this study have been: 


1. To discover the needs and interests of 
tenth grade pupils. 

2. To help pupils to discover and under- 
stand their needs. 

3. To organize and teach units of study 
that would meet satisfactorily as many 
of these needs as was possible. 

. To test pupils receiving this instruction 
for growth in character, personality, 
emotions, and attitudes. This was 
labeled guidance achievement. 
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5. To test both the guidance and scholastic 
achievement of the experimental pupils, 
those receiving instruction in the new 
units of study, and that of pupils re- 
ceiving the conventional method of in- 
struction that is found in most class- 
rooms. Pupils receiving the latter type 
of instruction made up the control 


group. 


Place of study.—The study was conducted 
in Mt. Vernon, Illinois, a town of fifteen 
thousand inhabitants, located on the northern 
edge of the coal field and the southern edge 
of the oil field. 

The Mt. Vernon Township High School 
has an approximate enrollment of twelve 
hundred pupils. The faculty consists of 
forty-two members. The school plant includes 
five large buildings, all of which are excep- 
tionally well equipped. 

Pupil needs.— During the school year 
1939-1940 the writer taught two tenth-grade 
English classes for which she reorganized the 
course of study into units of study that em- 
phasized the meeting of pupil needs. Before 
pupil needs could be met it was necessary to 
determine those needs and interests. A list of 
adolescent needs was compiled from a sur- 
vey of the literature by authorities in this 
field.1 These needs were assumed to be basic 
to all high school pupils. Although these 
basic needs could be determined from other 
writers, there were needs and interests of Mt. 
Vernon pupils that differed from those of 
pupils of other localities. 

1 Luella Cole, lee Pupeteivap of Adolescence. New York: 


Farrar and Rine'! 
Commission Education. The Social Studies in 
Progressive Education Associa- 


pong ober 

General Education. Chicago: 
tion, 1939. 

Leta S. — orth, The P. aug of the Adolescent. 
New Logs 8 wo and Co. 192 + 

Ern oo Social Lee RO of Adolescence. 
New Yack? Prentice-Hall, 1938. 

Daniel Aliess Prescott, Emotion and the Educative Process 
ashington: American Council on Education, be 8. 

Caroline Beaumont oy Personality of 
: Charles Scribner’ 1929. 
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The dean of girls and the dean of boys 
designed a questionnaire of 107 items to de- 
termine the interests of the pupils of Mt. 
Vernon. Each question had three possible 
answers—very much interested, somewhat 
interested, or not at all interested. The items 
centered around twelve main topics—home, 
finance, religion, educational growth, recrea- 
tion, relationships with others, personality 
adjustment, vocations, personal standards and 
law, parliamentary procedure, social-civic 
law, and health. The questionnaire was ad- 
ministered to the twelve hundred members of 
the student body early in the schoo! year, 
1939-1940. 

The writer determined the needs and in- 
terests of the fifty-six tenth grade pupils of 
her English classes by giving to them the 
“Interest and Activities” questionnaire? of 
the Progressive Education Association. This 
questionnaire consists of two hundred items, 
each of which has three possible answers— 
like, dislike, or indifferent. Special needs and 
interests that required emphasis were ascer- 
tained through conferences which the writer 
held with the principal and other teachers, 
and from her knowledge gained from previous 
teaching and from directing various activities 
of tenth grade pupils. 

It was not feasible to attempt to meet all 
tenth grade pupil needs and interests through 
the English class. Many of the needs were 
met through administrative procedures, as- 
sembly programs, club activity, and the 
recreational program of the school. 

The needs to be met in the experimental 
English classes were: 

1. Social 

a. Personality adjustment 
b. Relationships with others 
(1) Family 
(2) Same sex 
(3) Opposite sex 
. Educational 
. Vocational 
. Health and personal needs 
. Recreational 
. Philosophical 
a. Personal standards and laws 
. Religious 


. Cultural 


2 “Interests and Activities” (8.2b), Evaluation in the Eight 
Year Study, Progressive Education Association. 
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Units of study.—A unit of study was con- 
structed around a group of related needs. 
The materials from the text* that might be 
used in the discussion and fulfillment of these 
needs were selected for study. All poems, 
short stories, novels, non-fiction, dramas, etc., 
pertaining to a unit were included irrespective 
of literary division. Selections from the text 
were supplemented with material from the 
school, city, and instructor’s libraries. These 
were studied in class, or used for individual 
reports. 

The needs were discussed with the students 
in order that they might know and under- 
stand the purpose of the unit. Sometimes this 
discussion came at the beginning of a unit, 
while at other times it grew out of a discus- 
sion of the literature as it was studied and 
interpreted. The study of each unit was either 
accompanied with or foliowed by a discussion 
that emphasized ways in which needs could 
be met by the pupils. They were urged to 
test these new ideas and suggestions. 

The order in which the units were pre- 
sented was determined by the most dominant 
pupil interests at the time. 

Some needs could be met directly through 
the study and discussion of the literature, 
while others were met indirectly through im- 
plication, teacher personality, and democratic 
class procedures. 

Placing emphasis upon guidance by no 
means justified the failure to teach subject 
matter. Near the beginning of the school year 
some time was given to a survey and review 
of the history and form of literature. As a 
selection was studied, it was fitted into this 
general pattern. Grammar, word study, rules 
for punctuation, composition form, etc., were 
taught as the teacher sensed the students’ 
need for them. 

Throughout the year nine units of study 
were taught to the classes. The titles of these 
units, with a summary of one of them, the 
last, follow: 


1. How to Study 
2. Feeling Socially Secure 
a. A verbatim report of this unit, as it 
took place in class, may be found in 
another study by the writer.‘ 
*H. C. Schweikert, H. A. Miller, and Luella B. Cook 
oe in Appreciation. New York: Harcourt, Brace and 
0., e 
* Fannie Mae Crowe, “An ~~ y in Integrating Guid- 
ance With the Tenth Grade lish Course in The Mt. 
Vernon Township High School.’’ Unpublished Doctor’s field 
study, Colorado State College of Education, 1940. 
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. Relationships With One’s Family 

. Boy-Girl Relationships 

. The Wise Use of Leisure Time 

. Vocations 

. A Philosophy of Life 

. Religion 

. Health 

a. Needs 

(1) To be met directly—The stu- 

dent needs to 

(a) Evaluate properly the ben- 
efits of health; 

(b) Adjust satisfactorily to in- 
dividual physical appear- 
ance and peculiarities; 

(c) Make well balanced adjust- 
ment to all situations. 

(2) To be met indirectly 

(a) Be assured that he is devel- 
oping normally and under- 
stand that the physiolog- 
ical changes through which 
he is passing are evidences 
of normal development and 
are not occasions for worry; 
Recognize and understand 
himself as subject to the 
laws of living creatures; 

(c) Develop a wholesome 
healthful body; 

(d) Develop habits conducive 
to a healthful life. 

b. Materials 
(1) Textbook 

(a) Robert Louis Stevenson, 
The Vagabond. 

(b) Wilfrid Wilson Gibson, Re- 
becca Nixon and Martha 
Waugh. 

(c) Robert Louis Stevenson, 
The Strange Case of Dr. 
Jekyll and Mr. Hyde. 

(d) Edgar Allan Poe, The Fall 
of the House of Usher. 

(e) Paul de Kruif and Sydney 
Howard, Yellow Jack. 

(2) Supplementary 

(a) Elma Halloway, Unsung 
Heroes. 

(b) L. Kerschbaumer, “Art and 
Mental Health”, Avoca- 
tions, (June, 1939). 


c. Class Activity 

(1) Discussions of health needs, both 
physical and mental. 

(2) The class listed ways in which 
one can adjust to physical 
handicaps. 

(3) Oral reports were made by the 
pupils on the following people 
who made satisfactory adjust- 
ments to handicaps. 

(a) Helen Keller 

(b) Teddy Roosevelt 

(c) John Milton 

(d) Robert Louis Stevenson 
(e) William Henry Eustis 
(f) Dr. Harriet McGraw 
(g) Mrs. Harriet Pullen 


The class discussed unsatisfac- 
tory ways to adjust to mental 
‘ maladjustments and then listed 
satisfactory adjustments. 


(5) The class listed mental hygiene 
rules. 

(6) The class discussed the contri- 
butions of science to the phys- 
ical welfare of humanity. 


The controlled experiment—aA controlled 
experiment of the equated groups type was 
set up during the second year (1940-1941) 
of the experiment in order that a testing pro- 
gram might be carried out. Two tenth grade 
English classes composed of fifty-nine pupils, 
called the experimental group, were taught by 
the writer, using the integrated guidance 
method described above. Two other tenth 
grade English classes composed of fifty-seven 
pupils, called the control group, were taught 
by another instructor using the conventional 
type of teaching. In the latter classes no spe- 
cial emphasis was placed upon the meeting 
of student needs; the teaching of subject 
matter was the chief emphasis. 

In the equated groups, pupils were paired 
according to I. Q., achievement, sex, and 
chronological age. 1. Q. and achievement 
scores were obtained from The Illinois Gen- 
eral Intelligence Scale,5 The Psychological 
Examination for High School Students,* and 
The Diagnostic Examination of Silent Read- 
Inittgence Stale Bloomington, Minols: Public School Pub- 

Co 

*L. L. Thurstone and Thelma Gwinn Thurstone, American 

Council on Education Psychological Examination for High 
American’ Council on 


School Students. Washington, D. C.: 
Education, 1940. 
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ing Abilities.’ These scores were converted 
into equivalent scores in a new distribution, 
the mean of which was 50 and the sigma 14. 
An average of the three scores was found for 
each pupil. 
After the pairing was done, the fifty best 
pairs were selected for the experiment. 
Guidance achievement.—In order to test 
the guidance achievement of the two groups, 
the following tests were given near the be- 
ginning of the year and again at the close of 
the year: 
1. The Minnesota Scale for the Survey of 
Opinions® 
2. The Neymann—Kohlstedt Diagnostic Ex- 
amination for Introversion-Extroversion® 
3. The Feder Emotional Attitude Test 


In order to find whether each group had 
grown in its opinions, attitudes, and emotions 
during the year, the standard error of the 
difference between the mean at the beginning 
of the year and the mean at the end of the 
year, and the reliability of the difference of 
the means, were found. The difference be- 
tween the means was called positive if it 
favored the rating at the end of the year, thus 
indicating growth, and negative if it favored 
the rating at the beginning of the year. 

On the Minnesota Scale the experimental 
group showed a difference of plus .06 in favor 
of the score at the end of the year. The reli- 
ability of this difference is .o4, or the chances 
are 52 in roo that the group will, on the 
average, always score higher at the end of the 
year than at the beginning. 

On the same scale the control group showed 
a negative difference of 3.72 in favor of the 
beginning of the year. The reliability of this 
difference is 2.28, or the chances are 99 in 100 
that the group will score higher at the begin- 
ning of the year than at the end. 

On both The Neymann—Kohlstedt Diag- 
nostic Test and The Feder Emotional Atti- 
tude Test both groups showed negative differ- 
ences. The reliability of the differences was 
greater for the control group on both tests 
than for the experimental group. 

For the same three tests at the close of the 
year the Lage sco between the means of the 

™. J. tic 
poy ytkng 1-7. Sy al 


SE. A. Rundquist and R. F. Sletto, Minnesota Scale for 
the Survey of Opinions. Minneapolis: University of Minnesota, 


- 
he Neymann-Ko ——e tedt Diagnostic Test for Introversion- 
pd. Chicago: C. H. Stoelting Co., 1928. 


ENGLISH WITH EMPHASIS ON GUIDANCE 51 


two groups was found. If the difference 
favored the experimental group it was called 
positive; if it favored the control group, neg- 
ative. On The Minnesota Scale the difference 
of the means between the groups was plus 
3.42 in favor of the experimental group. The 
reliability of this difference is 2.23. 

The difference between the means on The 
Feder Emotional Attitude Test was plus 5.16 
in favor of the experimental group. The reli- 
ability of this difference is 2.9. 

On The Neymann—Kohlstedt Diagnostic 
Test the difference between the means was 
zero and the reliability of this difference is 
zero; thus the chances are fifty-fifty that one 
group will rank higher than the other. 


Scholastic achievement—To measure the 
scholastic achievement of both the experi- 
mental and control groups, objective tests*® 
published by the authors of the text were 
administered. At the conclusion of a unit of 
study the test corresponding to that unit was> 
given to the group. 

On two of the eight tests, Silas Marner and 
Biography, the difference of the means was 
positive in favor of the experimental group. 
The reliability of the difference on the Silas 
Marner test is .48, and 1.31 on the Biography 
test. 

On the other six objective tests the differ- 
ence of the means favored the control group. 
On the Essay, Sohrab and Rustum, and Dr. 
Jekyll and Mr. Hyde tests the reliability of 
the differences are .06, .8, and 1.16 respec- 
tively. For the three remaining tests, Yellow 
Jack, As You Like It, and Idylls of the King, 
the reliability of the ‘difference of the means 
is over 4. 

Conclusions —Based upon the results re- 
ported, the following conclusions are pre- 
sented: ‘ 

1. The group given an integrated guidance 
method of instruction manifested greater 
growth in attitudes, emotions, and opinions 
than did the group receiving the conventional 
method of instruction. 

2. The group receiving the conventional 
method of instruction showed greater scho- 
lastic achievement than did the other group. 

3. The experimental group showed a grow- 
ing interest in the study of literature as was 
manifested by their class discussions, indi- 
vidual conferences with the teacher, com- 

Luella B. Cook, Tests to Accompany Adventures in 
Appreciation. 
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ments on their free reading activities, and 
discussions on essay type examinations. 

Recommendations—In the light of the 
findings of this study, it is recommended that: 

1. Further study should be made in sec- 
ondary school literature classes in order to 
establish the reliability of the positive results 
of teaching literature by the integrated guid- 
ance method. It is also recommended that the 
same method of teaching be used in other 
high school classes where it seems feasible to 
do so, possibly in history, social studies, and 
civics, in order to find whether the procedure 
offers any advantages over the traditional 
method of teaching subject matter only. 

2. Another study should be made in the 
same school, using much the same methods 
of procedure in order to establish further the 
validity and reliability of the measuring in- 
struments used. 


a. In order to eliminate the element of the 
teacher variable, both groups should be 
taught by the same instructor. 

. The number in each group would be en- 
larged, and consequently the results 
rendered more reliable, if the instructor 
taught six classes of tenth grade Eng- 
lish, three classes to be used for the 
experimental group and three for the 
control group. 

. More than three attitude tests should 
be used. 

. Attitude tests should be constructed by 
the instructor of the experimental group. 
Objective tests prepared by someone 
else do not measure the attitudes, ideals, 
emotions, and standards that the in- 
structor has emphasized in teaching a 
particular group. 

. Instruments for testing subject matter 
as it is taught should be constructed and 
used in preference to those designed for 
testing subject matter as it might be 
taught by someone else. 


3. Pupils used in the present study should 
be followed up for the next two years in high 
school, and even after graduation from high 
school, in order to ascertain what more ulti- 
mate results this method of teaching may 
produce. It is not possible to measure all re- 
sults of the teaching of guidance within the 
year in which it is taught. Some results may 
not show up until several months or even 
years later. 
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4. A similar study should be made in 
ninth, eleventh, and twelfth grade English 
classes; in history, civics, and social studies 
classes; and elsewhere if feasible. 
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THE EFFECT OF A PROGRAM OF INITIAL INSTRUCTION ON 
THE PRONUNCIATION SKILLS AT THE FOURTH-GRADE 
LEVEL AS EVIDENCED IN SKILLS GROWTH* 


RALPH W. House 
Pennsylvania State College 


The problem—There has been a long 
standing controversy as to the comparative 
effectiveness of a word-recognition program 
that consists of the study of letters and their 
sounds and a word-recognition program made 
up of the study of the total familiar arrange- 
ment of the word. A review of the history of 
American reading instruction (10) reveals 
the forms which this controversy has taken. 
For example, from 1607 to the present time 
we have had no less than four periods during 
which classroom practice varied from minute 
word-analysis, no word-analysis, or some- 
where between the two extremes. 

The chief aim of reading instruction from 
1607 to 1840 was the accurate pronunciation 
of words by means of a reading program that 
emphasized mastery of the elements of words 
before the child was permitted to read. From 
1840 to about 1880 the reading program em- 
phasized the recognition of words by means 
of their configuration. From about 1880 to 
1915 there were the most elaborate phonic 
techniques for aiding word recognition that 
had ever been used in the reading programs 
of this country. From 1915 to about 1930 
emphasis was largely on silent reading for 
comprehension, with phonics for those pupils 
who needed such aid. 


All methods for promoting word recogni- 
tion and word-analysis prove to be imprac- 
tical, because modern readers are printed in 
an incomplete symbolization. The pupil does 
not know the value of the vowels and of 
many of the consonants. Hence, he can arrive 
at the correct pronunciation of a new and 
difficult word in one or two ways, which are: 
(1) he must have someone pronounce the 
word for him, again and again, until he can 
recognize it by sight, or (2) he must be 
taught how to use a complete symbolization 
in making an independent analysis of the 
word. The problem, therefore, of the present 
study is to determine how successfully fourth- 


* Abstract of a doctorate dissertation, Pennsylvania State 
College, 1940. 


grade pupils can master the use of a complete 
symbolization, as measured by independent 
analysis of unfamiliar words, following a con- 
trolled experience with a specific form of 
instructional material. 


The experiment—In order that the pupil 
population might be as homogeneous as pos- 
sible, the criteria used in the selection of the 
schools were as follows:. (1) the classrooms 
must be free from foreign accent; (2). the 
classrooms must possess subjects with the 
usual range of mental ability; and (3) the 
classrooms must possess pupils who are mak- 
ing satisfactory progress in the experiencing 
of their curriculum. The pupil population 
used in this experiment was 222. They were 
divided into a control group and an experi- 
mental group. The experimental group was 
subdivided into three groups and renamed 
Group E,, Group E., and Group E, with 55, 
53, and 56 pupils respectively. The variable 
within the experimental groups was the sym- 
bolization employed. Group E, used the reg- 
ular spelling with diacritics. Group E, 
employed the Webster system of phonetic re- 
spelling with diacritics. Group E, used the 
International Phonetic Alphabet. The par- 
allel or paired group method was not used in 
obtaining a control. Instead, the Peters Re- 
gression Technique was employed in obtain- 
ing a hypothetical matching of the groups. 
Using Dr. Peters’ regression formulae, a re- 
gression equation was obtained for the mental 
age, reading age, and equating pronunciation 
score for each pupil in the control group. By 
means of the regression equation a predicted 
final score is computed for each pupil in the 
experimental groups. If the predicted final 
score is less than the true final score for a 
pupil in the experimental group, it means 
that the experimental factor had a differen- 
tial effect in favor of the experimental group. 
If the predicted final score is greater than 
the true final score, it means that the experi- 
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mental factor had a differential effect in 
favor of the control group. 

The writer taught the three experimental 
groups. A program of instructional materials 
systematically developed was used. The 
method of instruction was that of highly 
motivated drill. The period of instruction 
was for fifty-four days. The pupils in the 
experimental groups received twenty minutes 
of instruction daily. The control group was 
taught by the regular classroom teachers. 
They used a functional method with the 
difficult words encountered in the lessons 
taught. The teachers of the control group 
taught the skills needed whenever an occa- 
sion presented itself throughout each of the 
fifty-four days during which this experiment 
was in progress. 

Two tests were administered both initially 
and finally. Each test contained fifty words. 
The words used were too difficult to be found 
in the reading vocabulary of ‘most fourth- 
grade pupils. The tests were administered by 
judges from the Speech Department of The 
Pennsylvania State College. 

Findings ——The achievements of the con- 
trol group and the experimental groups on 
The Pronunciation Skills Test, Form B Series, 
are shown in Table I. 

When the scores of the subjects in the 
experimental groups are compared with the 
scores of the subjects in the control group, 
standard error ratios of 4.65, 8.72, and 5.92 
are obtained. These ratios indicate that the 
achievements of the subjects in the experi- 
mental groups are decidedly superior to the 
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achievements of the subjects in the control 
group. 

Table II shows the achievements of the 
control group and the experimental groups on 
The Real- and Synthetic-Word Pronunciation 
Test, Form A Series. A comparison of the 
achievements of Group E, with Group C on 
the Form A Series test yields a difference 
mean of —o.23, as shown by Table II. A 
comparison of the achievements of Group E, 
with Group C on the Form B Series test 
gives a difference mean of 9.26, as shown by 
Table I. On the Real- and Synthetic-Word 
Pronunciation Test, Form A Series, Group E, 
did not do so well. 

In Table II the mean differences, as well 
as the standard error ratios, for Group E, and 
Group E, are highly significant. Group E, 
did not do as well as the control group. The 
symbolization taught to Group E,, as well as 
the differences due to the difficulty ranking 
of the words in the two series of tests, ac- 
counts for the success and the lack of success 
of Group E, on the two tests. In the opinion 
of the writer the Form B Series test was an 
easier test than the Form A Series test. 

Conclusions—An evaluation of the data 
summarized seems to justify the following 
conclusions: 

1. The ease and success with which fourth- 
grade pupils can use a complete symbolization 
as a phonetic aid in analyzing new words 
seems, in this experiment, to depend largely 
upon the method of instruction, the materials 
of instruction, and the complete symbolization 
employed. 


TABLE I 


A COMPARISON OF THE INDEPENDENT WORD-ANALYSIS ACHIEVEMENTS OF THE CONTROL AND 
EXPERIMENTAL GROUPS ON THE PRONUNCIATION SKILLS TEST, FORM B SERIES 


Standard 
Error 
1.993 
1. 182 
1. 673 


Difference 
Mean 
9. 26° 
10. 31 
9.90 


Groups 

Group C and Group FE: 
Group C and Group E: 
Group C and Group Es 


Ratio 
4.65 
8.72 
5. 92 


Significance 
Highly Significant 
Highly Significant 
Highly Significant 


TABLE II 


A COMPARISON OF THE ACHIEVEMENTS OF THE CONTROL AND EXPERIMENTAL GROUPS IN AN 
INDEPENDENT ANALYSIS OF NEw WoRDS WHEN THE REAL- AND SYNTHETIC- 
WorpD PRONUNCIATION TEST, ForRM A SERIES, WAS ADMINISTERED: 


Standard 
Error 


. 737 


1.366 
1. 579 


Difference 
Means 


—0. 23 


16. 43 
14. 83 


Groups 
Group C and Group FE: 


Group C and Group E: 
Group C and Group Es 


Ratio 
. 812 


12. 03 
9.39 


Significance 
62 chances in 100 that the true differ- 
ence is on the same side 
Highly Significant 
Highly Significant 
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2. The advantage of the Webster Key over 
the International Phonetic Alphabet is more 
apparent than real. 

The results of this study suggest certain 
other indirect implications that may be sum- 
marized as follows: 

1. Fourth-grade pupils can readily learn to 
use any type of complete symbolization if it 
contains one symbol, and only one symbol, 
for each sound in the English language. 

2. Complete learning had not been realized 
when this experiment closed. 

3. These fourth-grade pupils did not seem 
either to tire or to form a dislike for highly 
motivated drill on the blending of the speech 
sounds represented by the symbols in mono- 
syllabic synthetic words. 

4. Fourth-grade pupils should learn the 
word-analysis skills more rapidly when sys- 
tematic instruction and a functional use of 
what is taught can be integrated as the period 
of systematic instruction lengthens. 

5. The acoustic approach, accompanied by 
an organic approach, seems to aid fourth- 
grade pupils in the production and the blend- 
ing of the speech sounds in difficult words. 

6. This experiment seems to indicate that 
carefully constructed synthetic words are 
more desirable for measuring a pupil’s ability 
to analyze new words than are real words. 

7. The judges were in agreement that for 
fourth-grade pupils the fatigue caused by 
analyzing a fifty-word pronunciation test may 
affect the test scores. 

8. If several judges are used in rating the 
pronunciation of pupils in an experiment of 
this type, they should rate each pupil’s pro- 
nunciation of the test words simultaneously. 
If one judge rates any of the pupils in an 
experiment of this type, he should rate all of 
the pupils. 
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A FACTORIAL ANALYSIS OF READING ABILITY’ 


ROSALIND STREEP LANGSAM 
Hunter College, New York City 


The problem—tThe purpose of this inves- 
tigation is to determine, by means of a 
method of factorial analysis: (1) the factors 
revealed by a diversified battery of tests of 
reading ability and (2) the loading that each 
factor evidences in each of these tests. This 
study attempts to isolate those and only those 
factors that can be revealed by the battery 
of tests used. ; 

Significance of the problem. — Previous 
studies have treated of the complexity of the 
processes underlying reading, and have 
pointed to the need for both analyzing read- 
ing ability into its components and of con- 
structing new objective measures of these 
components. If improved measures of read- 
ing ability are to be devised, it is evident that 
a knowledge of the basic elements or abilities 
composing reading ability is essential to such 
construction. 

The need for this study is found in the 
hypothetical two-fold application of its find- 
ings: first, to the construction of reading tests 
and, second, to the field of educational guid- 
ance. In regard to the former, the isolation 
of the factors involved in reading and the 
determination of the relative loadings of the 
factors have value as a basis for test con- 
struction. The structure and content of read- 
ing tests would be affected and possibly 
determined by the proof of the existence of 
those basic components or unitary traits into 
which reading would be found to have been 
analyzed. The choice of tests would be deter- 
mined by the factors found to compose read- 
ing ability, and the relative weight assigned 
to each test would be determined by the 
relative and corresponding loading of each 
factor. 

In regard to the latter field, guidance, 
based upon a knowledge of the factors _ 
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volved in reading, of the relative weights of 
these factors, and upon the results of tests 
constructed in accordance with this knowl- 
edge, would be more intelligent than hereto- 
fore. Furthermore, either independent of the 
application of the findings of this study to the 
field of guidance, or more or less indirectly 
related to it through the use of remedial tech- 
niques, specific methods of instruction in 
reading may in turn be developed as a result 
of the information yielded by this investi- 
gation. 

The subjects of the investigation—The 
subjects of this study were one hundred stu- 
dents of the lower freshman class of Hunter 
College of the City of New York. Each sub- 
ject was a graduate of a high school of New 
York City, had been admitted to Hunter 
College in February, 1940, was a resident of 
New York City, and was female. The average 
age of the group tested was seventeen years 
and nine days, with a standard deviation of 
eleven months. Thus, the subjects of the in- 
vestigation were homogeneous as to sex and 
age, and fairly homogeneous as to education. 
This was in accordance with the requirements 
for homogeneity of a factorial analysis study. 

The choice of tests—The specific tests 
chosen for this analysis were those that were 
considered to provide the best means of re- 
vealing the basic elements of which reading 
is composed. The following were the tests 
used: 

The Iowa Silent Reading Test, New Edition, 
Advanced Test, which consists of seven 
parts: 

. Rate-Comprehension 

. Directed Reading 

. Poetry Comprehension 

. Word Meaning 

. Sentence Meaning 

. Paragraph Comprehension 
. Location of Information 


The Minnesota Reading Examination for 
College Students, which consists of two 


parts: 
1. Vocabulary 
2. Paragraph Reading 
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The Nelson—Denny Reading Test for Col- 
leges and Senior High Schools, which 
consists of two parts: 

1. Vocabulary 
2. Paragraph Reading 


The Minnesota Speed of Reading Test for 
College Students 
The Michigan Speed of Reading Test 
The Inglis Test of English Vocabulary 
The American Council on Education Psycho- 
logical Examination for College Fresh- 
men, 1939 Edition, which consists of six 
tests grouped in two general classes: 
1. Linguistic Tests 
a. Same-Opposite 
b. Completion 
c. Verbal Analogies 
2. Quantitative Tests 
a. Arithmetic 
b. Number Series 
c. Tables 


The Identical Forms Test of the Tests for 
Primary Mental Abilities 


The American Council on Education Psy- 
chological Examination was included in the 
battery of tests used in this study because of 
the aid it was believed this intelligence test 
might offer in isolating the factors underlying 
reading and, further, in interpreting more 
clearly whatever factors might be found. 

The Identical Forms Test, which is one of 
the Thurstone Tests for Primary Mental 
Abilities(16), has evolved as the non-verbal 
measure of the perceptual factor P, which 
may be described as a facility in perceiving 
detail that is imbedded in irrelevant mate- 
rial. This test was included in the battery of 
tests used for the purpose of casting light 
upon the factors of which reading is com- 
posed, and also as a possible means of clari- 
fying the meaning of “speed” of reading. 

For purposes of factorial analysis, each of 
the subtests of the above tests was regarded 
as a specific test. The tests were arranged on 
the basis of part scores where a specific test 
consisted of separate parts, and on the basis 
of a single score for the whole test where a 
specific test yielded only a score for the entire 
test. Thus, each part or whole, as the case 
was, received treatment as a separate test. 

For purposes of simplicity in referring to 
the tests, the names of some of them were 
abbreviated as follows: 
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= © ne The Iowa Silent Reading 


MR represents The Minnesota Reading 
Examination 

represents The Nelson—Denny Reading 
est 


ACE represents The American Council on 
Education Psychological Examination 


Also for purposes of ease in referring to the 
tests, each was numbered as follows: 


Name of Test 


Rate-Comprehension 
Directed ading 
Poetry Comprehension 
Word Meaning 
Sentence Meaning 
Paragraph Comprehension 
Location of Information 
Vocabulary 
Paragraph Reading 
Vocabulary 
Paragraph Reading 
Stinnesste Speed of Reading 
Michigan — of Reading 
Inglis Vocabulary 
ACE: Same-Opposite 
: Completion 
: Verbal Analogies 
: Arithmetic 
: Number Series 
: Tables 
Identical Forms 


Thus the tests comprised twenty-one variables. 
The factorial analysis —The factorial anal- 
ysis required first the intercorrelations of the 
twenty-one variables. This involved the com- 
putation of 210 intercorrelations, of which 
each coefficient of correlation was computed 
independently by two formulae: first, by the 
regular Pearson product-moment procedure 
and, second, by the cumulative frequency or 
summation method, as a check upon the first 
method. The resulting coefficients of correla- 
tion were set up in the form of a correlational 
matrix, which is presented in Table I. 

The method of factorial analysis used in 
this investigation was Thurstone’s Centroid 
Method(17). In accordance with this method, 
the factorial analysis was made in two stages: 
(1) the factoring of the correlational matrix, 
and (2) the rotation of the axes. The factor- 
ing of the correlational matrix involved the 
estimation of the communalities, the compu- 
tation of factor loadings and residuals accord- 
ing to the demands of the study, and the 
setting up of the resulting centroid matrix. 
Analysis of the correlational matrix yielded 
five significant factors. The centroid matrix, 
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consisting of the five unrotated factors ex- 
tracted from the correlational matrix, to- 
gether with the communalities (represented 
by the symbol h*) computed for their load- 
ings, is presented in Table II. The centroid 
matrix served as the point of departure for 
the second stage of the factorial analysis, 
that of the sotation of the axes. The arbitrary 
orthogonal reference frame yielded by the 
centroid method was rotated with regard to 
the configuration of twenty-one test vectors 
in five dimensions so that the coordinate axes 
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became meaningful. The rotated loadings 
with the communalities computed for the new 
factor loadings are shown as the rotated 
factorial matrix in Table III. 

The communalities are presented to show 
how much of the total variance of each test 
is accounted for by the various factor load- 
ings. The communality is the sum of the 
squares of the factor loadings for each test, 
regarded as percentages. One hundred per 
cent minus the communality represents that 
variance in the test that is due to a factor or 


TABLE II 
CENTROID MATRIX 


I 
. 587 
. 642 
. 670 
. 709 
. 690 
. 632 
. 654 
. 740 
. 485 
. 815 
. 670 
. 650 
. 749 
. 680 
. 742 
. 622 
. 698 
. 361 
. 471 
. 360 
.218 


. 096 
. 226 
. 151 
. 062 
.212 
. 230 
. 260 
. 409 
. 456 
. 350 
. 252 
. 088 


II III 

. 153 
. 084 
- 102 
.212 
. 073 
. 106 
. 048 


TABLE III 
ROTATED FACTORIAL MATRIX 


I 
. 395 
- 654 
.614 
. 452 
.618 
- 687 
. 570 
. 548 
. 338 
. 563 
. 718 
. 684 


- 603 


- 494 
. 298 
. 349 
. 544 
. 091 
. 260 
. 391 
. 164 
. 014 
. 299 
. 056 
- 189 
- 482 
. 018 
-311 
. 806 
. 542 
. 000 
. 206 
. 163 
. 454 


II III 

—. 086 

—. 002 
. 013 
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factors unmeasured by the tests used, and to 
errors of sampling. 

The final step in the factorial analysis was 
the identification or interpretation of the 
factors in relation to the tests used. The 
rotated factorial matrix, found in Table III, 
provided the basis for the presentation of the 
results of this study and for the interpreta- 
tion of the factors found. An examination of 
the tests with significant saturations in the 
various factors and of those with insignificant 
loadings served as the basis of the technique 
of interpretation. A projection or factor load- 
ing was regarded as nearly zero, if it was in 
the range of plus or minus .20, since it thus 
accounted for only four per cent of the total 
variance of the test. 

Interpretation of the factors—For pur- 
poses of determining which of the rotated 
loadings of the various tests were significant 
in each factor, use was made of the criterion 
employed by Thurstone(16). Only those 
tests with saturations of about .40 or higher 
were considered as having significant loadings. 
A useful and important check on this proce- 
dure was to examine those tests in which the 
factor was absent or relatively absent as indi- 
cated by insignificant loadings. 

1. The first factor was identified as the 
verbal factor V, characteristic of the first 
seventeen tests given above. In all these 
tests, the subjects must deal with ideas, and 
the factor is evidently characterized primarily 
by its reference to ideas and the meanings of 
words. A check on the verbal character of 
this factor was gained by inspection of the 
four remaining tests in which this factor was 
practically absent: Arithmetic, Number 
Series, and Tables (the three quantitative 
tests of the American Council on Education 
Psychological Examination) and the Identical 
Forms Test. These four tests, in contrast to 
the seventeen having significant loadings, are 
non-verbal in character, thus giving confirma- 
tion to the naming of the first factor as the 
verbal factor V. All the tests with significant 
saturations in this factor involve the inter- 
pretation of language, thus characterizing 
this factor that reflects an ability to deal with 
verbal material. 

2. The second factor was identified as the 
perceptual factor P. All the tests that involve 
speed, with the exception of the Minnesota 
Speed of Reading Test, had significant pro- 
jections on this factor: Rate-Comprehension, 
Word Meaning, Location of Information, 
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Michigan Speed of Reading, Verbal Analogies, 
and Identical Forms. Poetry Comprehension 
had a somewhat significant loading, which 
may be considered for interpretation along 
with the factor loadings of the six tests indi- 
cated above. The presence of the Identical 
Forms Test offered the basis for the interpre- 
tation of this factor. As previously mentioned, 
this test is a measure of the perceptual factor 
P, which is a function that appears to be a 
facility in perceiving detail. Tests that call 
for this ability require the quick perception 
of detail. Since Identical Forms is a non- 
verbal test, the indications concerning the 
identification of this second factor are that it 
is best described as a speed of perception 
factor. The five other tests having significant 
saturations, being verbal in form, might, 
when considered apart from Identical Forms, 
have led to the identification of a speed of 
reading factor. However, the latter removes 
them from the verbal group for purposes of 
interpretation, in view of its non-verbal char- 
acter. The presence of Word Meaning, which 
is the vocabulary test of the Iowa Silent 
Reading Test, in the group having significant 
projections on this axis, in contrast to the 
absence of the three other vocabulary tests of 
the battery, is explained on the basis of the 
comparative easiness of the test. It appeared 
that the function involved here was that of 
quickly perceiving and selecting the correct 
word from the other words offered as possible 
answers, in a manner similar to the use of 
that function in Identical Forms, although 
with verbal content in the former, in contrast 
to non-verbal content in the latter. A similar 
explanation may be applied to the presence 
of this factor in Location of Information, 
Poetry Comprehension, and Verbal Analogies, 
in each case taking into consideration the 
difference in content of each test but, at the 
same time, noting the similarity in the re- 
quirement of the use of the speed of percep- 
tion factor. The absence of the Minnesota 
Speed of Reading Test from the group having 
significant loadings, as compared with the 
presence of the Michigan Speed of Reading 
Test, may be explained on the basis of differ- 
ences in the form and manner of response of 
each of these tests. The latter test appears, 
by its mode of response, to depend upon an 
ability to perceive details quickly, which has 
been named the P factor. In contrast, in re- 
gard to the former test, the requirement con- 
cerning the manner of responding may actu- 
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ally interfere with the speed of response, thus 
making for the absence of a significant satu- 
ration of the P factor in the Minnesota Speed 
of Reading Test. 

3. The third factor was interpreted as the 
word factor W. The tests having significant 
projections on this factor were the Vocabulary 
Test of the Minnesota Reading Examination, 
the Vocabulary Test of the Nelson—Denny 
Reading Test, and the Inglis Vocabulary 
Test. That vocabulary test not having a sig- 
nificant entry in this factor was Word Mean- 
ing, which was examined above in relation to 
the perceptual factor. The three vocabulary 
tests having significant loadings in the pres- 
ent factor appear to agree in large part in 
the type of vocabulary involved, it being 
based on a general background knowledge, 
and at the same time contrast markedly in 
this content with that of the Word Meaning 
Test, whose words are of a technical nature 
related to specific school subjects. The latter 
test was so easy for the group tested that it 
represented chiefly perceptual speed. An ex- 
amination of the three vocabulary tests hav- 
ing significant saturations in the third factor 
indicates that they appear to involve a verbal 
factor which is separate from the verbal 
factor V. The verbal factor V is concerned 
with ideas and meanings, whereas this other 
verbal factor, described as the word factor 
W, seems to have as its principal characteristic 
a fluency in dealing with words, separate 
from V. It is evident that all three tests deal 
with single and isolated words. An examina- 
tion of the factor loadings also reveals sig- 
nificant loadings for each of these tests in 
the verbal factor V, and significant satura- 
tions also in the verbal factor W. This find- 
ing may indicate that some of the correct 
responses on these vocabulary tests were de- 
termined by knowledge of the meanings of 
the words to the extent that the words could 
have been specifically defined. The finding 
may further indicate that some of the other 
correct responses were determined by a 
knowledge of the degree of agreement or 
similarity in meaning of one of the possible 
answers in relation to the stimulus word, 
without, however, sufficient understanding of 
the meaning of the word to be able to 
define it. 

4. The fourth factor was identified as the 
number factor N. This factor was clearly 
limited to the numerical tests of the battery, 
the three tests that constitute the quantita- 
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tive group of the American Council on Edu- 
cation Psychological Examination: Arith- 
metic, Number Series, and Tables. 

5. The fifth and last factor was tentatively 
identified as the factor of seeing relationships. 
The tests with significant saturations were 
the Paragraph Reading Test of the Minne- 
sota Reading Examination with a markedly 
high loading, the three vocabulary tests that 
were described as involving the third factor, 
the Same-Opposite Test, and the Arithmetic 
Test. In addition, the Sentence Meaning Test 
and the Paragraph Reading Test of the 
Nelson—Denny Reading Test had fairly sig- 
nificant loadings. It is of particular interest 
to note here that Arithmetic had a significant 
projection on this factor, thus providing the 
only evidence of one of the quantitative tests 
having a factor in common with some of the 
non-quantitative tests of the battery used in 
this study. The outstandingly high loading in 
this factor of the Paragraph Reading Test of 
the Minnesota Reading Examination appears 
to be accounted for by the extent to which 
considerable specialized information is essen- 
tial to the adequate assimilation of the mate- 
rial of this test in particular. A careful exam- 
ination of the composition of each of the tests 
having significant loadings on this factor sug- 
gests that the common characteristic seems to 
be that of seeing relationships among the ele- 


{ments of the problem confronting one in the 
‘light of the specialized knowledge summoned 
‘up for the solution required. The presence of 


this factor indicates that reading ability in- 
volves more than just the V, P, and W factors, 
but that it also includes a seeing of relation- 
ships in itself, possibly involving logical 
organization and selection of pertinent ideas. 

Contribution of tests to total variance.— 
In regard to the loading that each factor 
evidenced in each of the tests, the percentage 
of the total variance attributable to each of 
the factors was computed. In each of the 
tests of the battery, the percentage of the 
total variance accounted for was determined, 
in addition to the percentage of the total 
variance that was attributable, respectively, 
to each of the factors found. 

General conclusions—Five factors were 
yielded by the battery of tests used in this 
study, and were identified as follows: 


1. a verbal factor V 
2. a perceptual factor P 
3. a word factor W 
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4. a number factor N 
5. a factor tentatively identified as that of 
seeing relationships 


It is to be noted that, although five factors 
were yielded by this analysis, one of the five, 
the number factor N, although accounting for 
seven per cent of the total variance of the 
battery of twenty-one tests used in this study, 
was found to be conspicuously absent from 
the reading tests of the battery. Therefore, 
this finding, in separating the factor N from 
the factors underlying reading, indicates the 
determining of four factors, as the factors 
basic to the reading process as revealed by 
this investigation. 

From an analysis of the contribution of the 
tests to the variance, it is clear that a single 
test may have significant loadings in one or 
more factors, indicating that the test is com- 
plex rather than simple. Seven of the tests 
of the battery yielded only one significant 
factor, not, however, the same factor in all 
the tests; from eleven of the tests were ex- 
tracted two and only two factors with sig- 
nificant loadings; and, from the three remain- 
ing tests of the battery, were removed three 
significant factors. 
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READING REACTIONS FOR VARIED TYPES OF SUBJECT 
MATTER: AN ANALYTICAL STUDY OF THE EYE- 
MOVEMENTS OF COLLEGE FRESHMEN* 


Lewis GORDON STONE 
Illinois State Normal University 


Statement of the problem.—The problem 
was a study of the eye-movements of 64 col- 
lege freshmen while reading various types of 
material. The selections were academic mate- 
rial of average freshman level. The respective 
vocabularies of the reading selections were 
equated. 

Purpose of the study—The purposes of 
this study were two: (1) to determine the 
nature of the eye-movements of college 
freshmen while reading different types of 
subject material and to identify relationships 
that may exist between eye-movements and 
various types of subject matter during the 
reading process, and (2) to determine the 
effect that various parts of a selection have 
on eye-movements as subjects progress 


through a 300 word reading selection. 
Delimitation of the study—This investi- 

gation was confined to eye-movements while 

reading various types of material in the fol- 


lowing fields: (A) arithmetic (mathematical 
concepts), (B) biological science, (C) Eng- 
lish (expository prose), (D) educational psy- 
chology, (E) physical science, and (F) social 
science. 

The study of eye-movements was restricted 
to the photographic film record taken by the 
Ophthalm-O-Graph. 

Sixty-four New York University freshmen 
students in the School of Education served as 
subjects in the research. 

Each subject’s eye-movements were re- 
corded while he read 1,850 words. This read- 
ing material included paragraph selections, 
300 words in length, from each of the six 
fields mentioned above and one standardized 
paragraph selection of 50 words in length. 

On the basis of an intelligence test, the 
subjects were representative freshmen stu- 
dents. 


* Abstract of a doctorate dissertation, New York University, 
1941. Acknowledgment is made of the guidance of Professors 
Ernest R. Wood, Robert K. Speer, Ambrose H. Suhrie 
in carrying out the investigation. 


Experimental procedure—The 300-word 
selections were referred to as paragraphs A, 
B, C, D, E, and F respectively. The stand- 
ardized selection of 50 words was referred to 
as paragraph G. Paragraph G was used as a 
preliminary selection. It was the first reading 
selection given to each individual. 

At least two New York University School 
of Education professors were selected for 
each of the six fields, to aid the investigator 
in selecting the reading paragraphs. Each 
professor was a specialist in the field for which 
he was chosen. The professors of a given field 
were consulted during the selecting of the 
paragraphs. They chose from recent books in 
their respective fields, whenever possible, typi- 
cal freshman reading materials. The six selec- 
tions finally accepted were comparable, re- 
spectively, in difficulty of vocabulary and 
length of sentence. The vocabulary of each 
selection was canvassed in terms of Thorn- 
dike’s Teachers’ Word Book of 20,000 Words. 

The selections were printed on cards. Ten- 
point type size was used. Spacing between 
words was 31% points. The length of the line 
was 27 picas. The selections averaged 11.7 
words per line. 

A standard, uniform procedure of handling 
the subjects and giving directions was used 
throughout the tests. Three different sittings, 
with at least one day intervening between 
successive sittings, were used in photograph- 
ing the eye-movements of each subject. Two 
selections were administered at each sitting. 
The order of the paragraphs read by each 
subject was determined by chance so that 
there would be no fixed order. 

The following eye-movement measures 
were used: (1) rate of reading (words per 
minute); (2) number of fixations (including 
regressions) per 100 words; (3) number of 
regressions per 100 words; (4) time in sec- 
onds to read roo words; (5) average span 
of recognition (in words); and (6) average 
duration of fixations (in seconds). The com- 
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prehension score was also used. This score 
was based on the percentage of questions 
answered correctly. A thorough comprehen- 
sion of the material read was demanded. 
Fourteen questions of the objective type were 
used for each of the six paragraphs. Eight 
were true-false and six were of the completion 
type. These questions were such as to in- 
volve understanding as well as knowledge. 
Questions were answered in writing. After the 
tests were administered and corrected, the 
discriminating value of each item was estab- 
lished in terms of the critical ratio of the dif- 
ference between the upper 25 per cent and 
the lower 25 per cent of the subjects on the 
basis of the comprehension test scores. Any 
item having a critical ratio less than 2.24 
probable error (D/PE,) was discarded. Be- 
tween 10 and 13 test items of each selection 
met at least the above requirement. The 
comprehension scores were based on these 
items. The comprehension test for paragraph 
G included ro true-false items. After the dis- 
criminating value of these items was estab- 
lished, four of the original ten items were 
discarded because they had a probable error 
below 2.24. Comprehension scores for this 


paragraph were based on the remaining six 


items. 


Summary of findings——The average com- 
prehension score for each subject was paired 
with the average rate of reading for each sub- 
ject. The correlation coefficient between com- 
prehension and rate for the 64 sv! iects was 
.162. The following variation oc arred, in 
correlation coefficients, when comprehension 
scores were paired with rate of reading based 
on various parts of selections. When compre- 
hension was paired with rate, the rate being 
based on the reading of the second 100 words, 
r was .115. When comprehension was paired 
with rate, the rate being based on the reading 
of the first roo words, r was .292. This range 
in r from .115 to .292 existed despite the fact 
that the eye-movement measures for the sec- 
ond roo words were significantly more effi- 
cient, with the exception of duration of fixa- 
tion, than they were for the first roo words. 

Findings from the phase of the study in 
which measures were based on the reading of 
300 words showed that significant differ- 
ences existed between selections, even though 
the selections were comparable in vocabulary 
difficulty and all on the same academic level. 
These differences occurred in mean rate, mean 
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number of fixations, mean number of regres- 
sions, and mean span of recognition, but did 
not occur in mean duration of fixations. Para- 
graph D was read with more efficient eye- 
movements than any of the other selections. 
The means for the group on this paragraph 
were: rate, 267.34 + 12.95; fixations, 85.61 
+ 2.18; regressions, 13.00 + 1.52; span, 
1.24 + .0401; and duration, .302 + .0127. 
Paragraph B was read with the least efficient 
eye-movements. The means for the group on 
this paragraph were: rate, 212.83 + 10.14; 
fixations, 94.66 + 2.683; regressions, 16.91 
+ 1.16; span, 1.11 + .0346; and duration, 
350 + .o160. 

There were significant differences between 
paragraphs D and B in all measures. The dif- 
ferences favored paragraph D. The differ- 
ences given in terms of critical ratios 
(D/cD) were found by dividing the obtained 
differences by the standard error of the differ- 
ence. The critical ratios for the various 
measures were: rate, 3.33; fixations, 2.62; 
regressions, 2.04; span, 2.45; and duration, 
2.00. Critical ratios between paragraphs C 
and B were as follows: rate, 3.04; fixations, 
2.24; regressions, 1.40; span, 1.92; and dura- 
tion, 1.88. The difference with respect to 
each measure favored paragraph C. The only 
significant difference found between para- 
graphs D and A was in regressions. The criti- 
cal ratio for this measure was 2.04 and fav- 
ored D. A critical ratio of 1.98 existed be- 
tween paragraphs D and E in rate, the dif- 
ference favoring D. In all other measures the 
differences between paragraphs D and E were 
not significant. There was a significant dif- 
ference in rate between paragraphs A and B, 
in favor of A, with a critical ratio of 2.27. 
The differences were not significant in any of 
the other measures between A and B. 

A number of significant differences resulted 
when the eye-movement measures were based 
on different parts of selections. In all meas- 
ures except duration the most efficient meas- 
ures of eye-movements are to be found in 
the third hundred words. These are slightly 
more efficient than when measures are taken 
on the second hundred words. 

It may be seen in Table III that there were 
significant differences between the first hun- 
dred and second hundred words in all meas- 
ures except duration of fixation. The differ- 
ences between the first and second 100 word 
groups were not as great in selections C and 
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F as they were in the other four selections. 
The two selections, C and F, were narrative 
in nature. Selections A, B, D, and E were 
scientific in content. When measures were 
based on entire selections of 300 words they 
were less efficient than when based on the 
third hundred words. 

Eight pairings are shown in Table IV. Sig- 
nificant differences in rate were found in four 
cases. Significant differences were found in 
fixations in seven out of eight pairings. Sig- 
nificant differences were found for regressions 
in four pairings. Significant differences were 
found for span of recognition in six pairings. 
No significant differences were found for dura- 
tion of fixations. Differences were greater, 
between parts, for number of fixations and 
span of recognition than for other measures. 
It is interesting to note that, in all eight pair- 
ings, in Table IV, duration of fixation each 
time favored a different part in contrast to 
the measures of rate, fixations, regressions, 
and span. There appears to be a relationship 
between longer fixations and efficient eye- 
movements in the other measures. The criti- 
cal ratios between the entire selection and the 
third hundred words were: fixations, 2.18; 
span, 2.40; rate, 1.31; and regressions 1.63. 
The differences existing between the latter 
two measures are large but not significant. 

The greatest differences in measures be- 
tween the first and second hundred-word 
groups were found in number of fixations, 
number of regressions, and span of recogni- 
tion. 

The differences occurring in fixations and 
span were greater than those occurring in re- 
gressions, when comparisons were made be- 
tween parts of selections. There was no indi- 
cation of increased efficiency due to a shorter 
mean duration from one group of 100 words 
to the next. Duration of fixation showed a 
tendency to increase slightly as the readers 
progressed through the selections. There was, 
however, a very slight decrease during the 
reading of the third hundred words. 

Individual and group differences were large 
between different selections as well as within 
the same selection. When measures were com- 
puted on the first hundred words, the mean 
rate for the reading of paragraph B, the selec- 
tion read slowest, was 179.77 words per min- 
ute, and the rate for paragraph D, the selec- 
tion read fastest, was 233.66 words per min- 
ute. The difference was 53.89 words per min- 
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ute. However, these same two selections show 
a difference of 77.74 words per minute on the 
third hundred words. The mean rate for 
paragraph B on the third hundred words was 
215.59, and for paragraph D was 293.33. 

The largest difference in mean rate occurred 
in the reading of the first hundred words of 
paragraphs B and C. The former showed a 
mean rate of 179.77 and the latter a mean 
rate of 260.08. The difference was 80.31 
words per minute. The values presented in 
the following summary resulted from comput- 
ing each individual’s average score for the 
six different selections. The means were found 
from the 64 individual averages. 


1. Mean rate when measures were based 
on the reading of: 
(a) 300 words, 243.63 + 8.69; S.D., 
69.0; range, 109-419. 
(b) rst 100 words, 219.34 + 7.76; 
S.D., 61.6; range, 102-389. 
(c) 2nd 100 words, 254.47 + 8.88; 
S.D., 70.5; range, 117-426. 
(d) 3rd 100 words, 260.50 + 7.49; 
S.D., 78.2; range, 109-451. 
. Mean number of fixations when meas- 
ures were based on: 
(a) 300 words, 89.17 + 2.31; S.D., 
18.37; range, 55-139. 
(b) rst 100 words, 100.68 + 2.53; 
S.D., 20.06; range, 67-164. 
(c) 2nd roo words, 82.03 + 2.11; 
S.D., 16.71; range, 52-132. 
(d) 3rd roo words, 81.15 + 2.85; 
S.D., 22.66; range, 48-126. 
. Mean number of regressions when 
measures were based on: 
(a) 300 words, 15.41 + 1.033; S.D., 
8.20; range, 3.0—45.0. 
(b) rst too words, 19.95 + 1.420; 
S.D., 11.25; range, 6.0-54.0. 
(c) 2nd roo words, 14.47 + 0.943; 
S.D., 7.49; range, 2.0—40.0. 
(d) 3rd roo words, 13.23 + 0.845; 
S.D., 6.71; range, 1.5—34.0. 
. Mean span of recognition when meas- 
ures were based on: 
(a) 300 words, 1.19 + .0309; S.D., 
0.245; range, 0.72—1.84. 
(b) rst 100 words, 1.07 + .0243; 
S.D., 0.193; range, 0.64—1.51. 
(c) 2nd roo words, 1.28 + .0363; 
S.D., 0.288; range, 0.77—1.82. 
(d) 3rd 100 words, 1.30 + .0340; 
S.D., 0.270; range, 0.80-2.10. 
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Diagram 1. Mean Rate of Reading for the ist 100, 2nd 100, 3rd 100, 
and the Entire 300 Words for Six Selections. 
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Diagram 3. Mean Number of Re ions for the 1st 100, 2nd 100, 3rd 
100, and the Entire 300 Words for Six Selections. 
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Diagram 5. Mean Duration of Fixation for the 1st 100, 2nd 100, 3rd 
100, and the Entire 300 Words for Six Selections. 
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5. Mean duration of fixation when meas- 
ures were based on: 

(a) 300 words, .3217 + .0095; S.D., 
.0755; range, .211-.570. 

(b) rst roo words, .3210 + .0093; 
S.D., .0735; range, .215-.545. 

(c) 2nd roo words, .3350 + .0cogT; 
S.D., .0721; range, .230—-.590. 

(d) 3rd 100 words, .3300 + .0096; 
S.D., .0762; range, .220—560. 

6. Mean number of fixations per line (aver- 
age number of words per line was 11.7) 
based on the reading of: 

(a) 300 words, 9.80. 

(b) rst roo words, 10.90. 
(c) 2nd 100 words, 9.42. 
(d) 3rd 100 words, 9.00. 

7. Mean number of regressions per line 

based on the reading of: 
(a) 300 words, 1.80. 
(b) 1st 100 words, 2.30. 
(c) 2nd roo words, 1.67. 
(d) 3rd roo words, 1.53. 


Conclusions.—Significant differences in in- 
dividual and group measures of eye-move- 
ments in rate, fixations, regressions, and span 
appear when different types of academic mate- 
rials are read, even though the difficulty of 
the vocabulary and the length of the sentences 
remain constant. However, duration of fixa- 
tion was not significantly affected. There is a 
slight increase, in group results, in duration 
of fixation accompanying increased efficiency 
of other eye movement measures. 


There is a speeding-up process in reading - 


covering three successive hundred-word 
groups. This speeding-up process is accom- 
panied by increased efficiency in all eye-move- 
ment measures except duration of fixation. 

There are two distinct patterns of the 
speeding-up process. One of these patterns oc- 
curs in the expository prose-story-form type’ 
of material. The rate is relatively fast for the 
first hundred words, with a slight increase of 
about 10 words per minute for the second 
hundred, and no further increase in the third 
hundred words. The other pattern occurs in 
the scientific type? of material. The rate is 
relatively slow for the first hundred words 
followed by a significant increase for the sec- 
ond hundred words and a slight increase, 
about ro words per minute, in the third hun- 
dred words. 


1 Paragraphs C and F. 
2 Paragraphs A, B, D, and E. 
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The first hundred words of a selection hav- 
ing scientific content are read with less effi- 
cient eye-movements, exclusive of duration of 
fixation, than the first 100 words of selections 
having a narrative style. Apparently there are 
factors present in a selection having a scien- 
tific content that are not present in a selection 
with a narrative style. 

The marked differences existing in the first 
100 words, between the eye-movement meas- 
ures for a selection having a narrative style 
and a selection with a scientific content, tend 
to disappear during the reading of the sec- 
ond and third hundred words. 

Increase in rate of reading, within a given 
selection, is accompanied by—and is due more 
to—fewer forward fixations and larger span 
of recognition than to the decrease in the 
number of regressions. 

Inefficient eye-movements occur during the 
reading of lines composed of the frequently 
used words, words appearing among the first 
5,000 in frequency in the Thorndike list. 

There is not a close relationship between 
the difficulty of words, based on Thorndike’s 
list, and the duration of fixations or the 
number of regressions. 

If a good average sample of the reading 
ability of a college student is to be secured, 
the film record of the second or third hun- 
dred words should be used. 


SELECTED REFERENCES 


. Anderson, I. H. “Studies in the Eye- 
movements of Good and Poor Readers,” 
Psychological Monograph, XLVIII, No. 
5, 1937. Pp. 1-35. 

. Buswell, Guy T. An Experimental Study 
of the Eye-Voice Span in Reading. Sup- 
plementary Educational Monographs, No. 
17. Chicago: University of Chicago 
Press, 1920. 

. Buswell, Guy T. Fundamental Reading 
Habits: A Study of Their Development. 
Supplementary Educational Monographs, 
No. 21. Chicago: University of Chicago 
Press, 1922. Pp. xiv + 150. 

. Buswell, Guy T. How Adults Read. Sup- 
plementary Educational Monographs, No. 
45. Chicago: University of Chicago Press, 
1937. Pp. xiii + 158. 

. Dearborn, Walter F. “Structural Factors 
Which Condition Special Disability in 
Reading,” Proceedings and Addresses of 





September, 1941] EYE MOVEMENTS OF COLLEGE FRESHMEN i 


the Fifty Seventh Annual Session of the 17. Taylor, Earl A. Controlled Reading: A 


American Association on Mental Defi- 
ciency, 1933. Pp. 268-283. 

. Dodge, Raymond. Study of Visual Fixa- 
tion. Psychological Review, Monograph 
Supplement, VIII (1907), No. 4. Pp. 
1-88. 

. Eurich, Alvin C. “The Reading Abilities 
of College Students,” English Journal, 
College Edition, XXI (February, 1932), 
135-141. 

. Eurich, Alvin C. “Additional Data on the 
Reliability and Validity of Photographic 
Eye-Movement Records,” Journal of Ed- 
ucational Psychology, XXIV (May, 
1933), 380-384. 

. Gray, William S. “Contributions of Re- 
search to Special Methods: Reading,” 
Thirty-Seventh Yearbook of the National 
Society for the Study of Education, pp. 
99-106. Bloomington, Illinois: Public 
School Publishing Co., 1938. 

. Jasper, H. H., and Murray, E. “A Study 
of the Eye-movements of Stutterers Dur- 
ing Oral Reading,” Journal of Experi- 
mental Psychology, XV (October, 1932), 
528-538. 

. Judd, Charles H., and Buswell, Guy T. 
Silent Reading: A Study of the Various 
Types. Supplementary Educational Mon- 
ographs, No. 23. Chicago: University of 
Chicago Press, 1922. Pp. xiv + 160. 

. Miles, Walter R., and Segal, David. 
“Clinical Observation of Eye Movements 
in the Rating of Reading Ability,” Jour- 
nal of Educational Psychology, XX (Oc- 
tober, 1929), 520-529. 

. Pollock, Martha, and Pressey, L. C. “An 
Investigation of the Mechanical Habits 
in Reading of Good and Poor Readers,” 
Educational Research Bulletin, TV (Sep- 
tember 23, 1925) 273-75. 

. Robinson, F. P. The Role of Eye-Move- 
ments in Reading With an Evaluation of 
Techniques for Their Improvement. Uni- 
versity of Iowa Studies, No. 39, 1933. 
Pp. 1-52. 

. Sisson, E. Donald. “The Causes of Slow 
Reading: An Analysis,” Journal of Edu- 
cational Psychology, XXX (March, 
1939), 206-214. 

. Sisson, E. Donald. “Role of Habit in the 
Eye-movements of Reading.” Unpub- 
lished Doctor’s thesis, University of Min- 
nesota, 1936. 


Correlation of Diagnostic, Teaching, and 
Corrective Techniques. Chicago: Univer- 
of Chicago Press, 1937. Pp. xxiii + 367. 


. Terry, P. W. How Numerals Are Read. 


Supplementary Educational Monographs, 
No. 18. Chicago: University of Chicago 
Press, 1922. 


. Tinker, Miles A. A Photographic Study 


of Eye Movements in Reading Formulae. 
Genetic Psychology Monographs, Vol. 
III, No. 2. Worcester, Massachusetts: 
Clark University, 1928. Pp. 68-182. 


. Tinker, Miles A. “Eye Movements in 


Reading,” Journal of Educational Re- 
search, XXX (December, 1936), 241- 
277. 


. Tinker, Miles A. “Eye Movement, Pause 


Duration, and Reading Time,” Psycho- 
logical Review, XXXV (September, 
1928), 385-397. 


. Tinker, Miles A. “Reliability and Valid- 


ity of Eye Movement Measures of Read- 
ing,” Journal of Experimental Psychol- 
ogy, XIX (1936), 732-746. 


. Tinker, Miles A. “The Use and Limi- 


tations of Eye Movement Measures in 
Reading,” Psychological Review, XL 
(July, 1933), 381-387. 


. Tinker, Miles A. “Time Taken by Eye 


Movements in Reading,” Journal of Ge- 
netic Psychology, XLVIII (1936), 468- 
471. 


. Tinker, M. A., and Paterson, D. G. 


“Influence of Line Width on Eye Move- 
ments,” Journal of Experimental Psy- 
chology, XXVII (November, 1940), 572- 
577- 


. Tinker, M. A., and Paterson, D. G. 


“Forms, Caps and Lower Casings,” Jour- 
nal of Experimental Psychology, XXVI 
(September, 1939), 528. 


. Vernon, M. D. The Movements of the 


Eyes in Reading. Special Report, Series 
No. 148. London: His Majesty’s Sta- 
tionary office, 1930. 


. Walker, Robert Y. The Eye Movements 


of Good Readers. University of Iowa 
Studies in Psychology, No. 17, pp. 95- 
117; also Psychological Monographs, Vol. 
XLIV, No. 3, 1932. 


. Witty, Paul A. “Approach to Better 


Reading: An Evaluation,” Educational 
Administration and Supervision, XXV 
(February, 1939), 81-92. 





AN EMPIRICAL EVALUATION OF THE ACCOMPLISHMENT 
QUOTIENT: A FOUR YEAR STUDY AT THE 
JUNIOR HIGH SCHOOL LEVEL’ 


Lipa HARMER HAGGERTY 
Riverside, California 


INTRODUCTION 


Criticisms against the accomplishment quo- 
tient began soon after the popularization of 
the technique* and have continued from time 
to time down to the present*, but there is a 
paradox in the status of the accomplishment 
quotient. All that has been contributed 
against it over a period of nearly twenty years 
has not sufficed to remove it from actual 
practice. 

It continues, to a considerable extent at 
least, not only in our common schools but in 
our colleges and universities as well. Espe- 
cially does one encounter it in the present, 
popular research studies of the relationship 
between motivation and personality adjust- 
ment.* All of these appear to employ some 
form of the concept, and some of them obvi- 
ously accept it as it was originally used, ob- 
taining their measure of the relationship be- 
tween capacity and achievement by means of 
a single sampling of each. 

It can still be found in textbooks in pre- 
ferred use in our college classrooms, books of 
recent date and of excellent standing. Sets of 
instructions and practice exercises accompany 
its presentation, and prospective teachers are 
urged to familiarize themselves with the meth- 
ods of deriving it and with its various appli- 

1 This article is a part of a dissertation submitted to the 
Department of Education of the University of California in 
candidacy for the degree of Doctor of Philosophy, 1941. 

? Herbert A. Toops and P. M. Symonds, “What Shall We 
Expect of the Accomplishment Quotient?’’ Journal of Edu- 
cational Psychology, XIII (December, 1922), 513-28; XIV 
(January, 1923), 27-38. 

* Edward E. Cureton, “The Accomplishment Quotient 
Technique,” Journal of Experimental Education, V (March, 


1937), 315-326. 
*Walter R. Hepner, “Factors Underlying | b= 
159-98. 


Scholastic Achievement % i Freshmen,” 
Experimental Education, 1 (March, 1939), 

Ross Stagner, ‘‘The AF. of Personality to Academic 
Aptitude and Achievement,” Journal of Educational Re- 
search, XXVI (May, 1933), 648-660. 

Mary E. Sarbaugh, “Effect of Home Surroundings on Aca- 
demic PR Famenom 5 ” Studies in Articulation of High School 

College, pp. 245-276. University of Buffalo Studies, 
XIII. Buffalo, New York: University of Buffalo, 1936. 

Mazie Earle Wagner, “Studies in Academic Motivation,” 
Studies in Articulation of High School and College, pp. 
194-209. University of Buffalo Studies, IX. Buffalo, New 
York: University of Buffalo, 1936. 


cations.’ In some instances it is referred to 
as the most valuable information that the 
teacher can have of the student.® 

Such a situation makes it clear that either 
the practical value of the concept outweighs 
its theoretical objections, or that the criticisms 
against it have not been brought to the re- 
alization of educational workers with suffi- 
cient emphasis. 

Current opinion in school circles reveals an 
amazing variety of viewpoints concerning it, 
from those which appear quite unaware that 
criticisms have been made against it, to those 
which claim that enough has been published 
to discredit it. Two viewpoints seem to pre- 
vail most commonly among the many. One 
of these is that the literature on the subject 
is too technical for practical consumption and 
too confined to the realm of theory; the other, 
that the principal objection to the AQ is its 
questionable reliability, but that large and 
representative samplings remove this objec- 
tion and justify the procedure. 

The literature, like current opinion, runs 
to both extremes, from studies which accept 
the AQ and use it unquestioningly’ to those 
which would abandon it entirely.* Between 
the two are many compromises, some approv- 
ing the concept with reservations and others 
suggesting modifications in the technique and 
its principles.* Critics differ among them- 
selves, attacking each others procedures and 
conclusions,*° and while it may be true that 
some of their differences are more verbal than 
substantial, still, the appearance of discrep- 
ancy serves further to heighten confusion, so 


5 Harry A. Greene and Albert N. Jorgensen, The Uses and 
Interpretation of High ag Tests, pp. 241-245. New York: 
Longmans, Green and Co., 1938. 

* Charles E. Skinner, Educational Psychology, p. $13. New 
York: Prentice Hall, 1936. 

™See footnote number 4. 

*Truman L. Kelley, Interpretation of Educat 
urements, pp. 202-09. New York: World Book Co., 

* Cureton, ye cit. 

% Harl R. Douglass and C. L. Huffaker, “Correlation be- 
tween Intelligence be a. Accomplishment Quotient,” 
—. a 3 ee. XIII (February, 1929), 78. 
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that a student takes stock of his varied find- 
ings in the classroom, the seminar, and the 
library, and concludes that more might be 
done with profit on the subject of the accom- 
plishment quotient. 


THE PROBLEM 


It was the purpose of the present study: 
(1) to secure an unparalleled sampling of in- 
telligence and achievement in a true teaching- 
learning situation; (2) to compute separate 
and consecutive distributions of AQ’s for the 
same group of pupils over the entire testing 
period; (3) to compute a composite AQ dis- 
tribution for the same sampling; and (4) to 
unite these two approaches to the problem 
in an effort to afford conclusive demonstration 
of the nature of the accomplishment quotient 
on an empirical basis. 

If the concept is guilty of the faults charged 
against it, educators can ill afford to continue 
using it. More should be done in such a case 
“to stop the ball from rolling.” It is believed 
that the present inquiry finds partial justifi- 
cation in the nature of its approach to the 
problem. It seeks to avoid the technical and 
the theoretical by simply putting the AQ to 
work in an actual school situation, following 


it along over a period of four full years, and 


noting the nature of its behavior. The study 
is an effort to test empirically Kelley’s sta- 
tistical findings which maintain that there is 
a gO per cent communality between our so- 
called measures of “intelligence” and 
“achievement.” It is an effort, also, to dis- 
cover what advantages can be attributed to 
the RAQ (Regression Accomplishment Quo- 
tient) over the AQ by the use of these par- 
ticular data. And, finally, it is an inquiry into 
the oft-encountered contention that a large 
enough sampling will insure reliability and 
thereby remove the only real objection to the 
AQ procedure. 

The study was particularly prompted by 
opportunity for access to an altogether extra- 
ordinary sampling of intelligence and achieve- 
ment—extraordinary from the standpoint of 
both quantity and quality. The writer knows 
of no other inquiry making use of sixteen 
consecutive distributions of AQ’s for the same 
subjects over a four-year testing period. 
These separate distributions permit detailed 
and comparative investigation, while the com- 
posite distribution guarantees high reliability 

™ Kelley, op. cit. 
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and shows the AQ under optimum conditions. 
As for the quality of the sampling, it should 
be said that the data were collected by the 
University of California Institute of Child 
Welfare in connection with “A Growth Study 
of Adolescents.” Every detail of the program 
was planned and carried forward by experts 
in the field, chosen because of their special 
ability and experience. 


THE DATA 


Intelligence data.—Results of eight admin- 
istrations of intelligence tests: CAVD, one; 
Kuhlman-Anderson, two; Stanford Binet 
(1916 revision) one; and Terman Group, 
four. 

Educational data.—Results of eight admin- 
istrations of the New Stanford Achievement, 
Advanced Examination (four of Form V and 
four of Form W) using the following sub- 
tests: Paragraph Meaning, Word Meaning, 
Arithmetic Reasoning, Arithmetic Computa- 
tion, Language Usage, Literature, History- 
Civics, and Geography. 


THE SUBJECTS 


One hundred and seventy one pupils, about 
equal numbers of boys and girls of the Clare- 
mont Junior High School, Oakland, Califor- 
nia, participated.** These constitute a fair 
cross-section of the Oakland public school 
population of comparable age and grade, 
though it is granted by the staff members who 
conducted the Adolescent Study that they 
may be a little superior in intelligence and 
somewhat more homogeneous than the group 
of which they are a part. The difference, 
however, is in no case reliable. None of the 
criteria used to check the factor of selection 
showed statistical significance. At the time of 
the first testing the pupils were in high sixth 
grade and low seventh, with the sixth scat- 
tered through five elementary schools of the 
city and the seventh at Claremont, grouped 
homogeneously with respect to ability. For 
the remainder of the study all pupils were at 
Claremont. 


Time oF Test ADMINISTRATIONS 


Intelligence tests, higher grade——Kuhlman- 
Anderson, Spring 1932; CAVD, Spring 1933; 
Stanford Binet, Spring 1933; Terman Group 
(Forms A and B), Spring 1933; Kuhlman- 

12 The number was reduced to 163 for the work with the AQ. 
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Anderson, Fall 1934; Terman Group (Form 
A), Spring 1935; Terman Group (Form B), 
Spring 1935; Terman Group (Form B), Fall 
1935. 

Intelligence tests, lower grade—CAVD, 
Fall 1932; Kuhlman-Anderson, Spring 1933; 
Stanford Binet, Fall 1933; Terman Group 
(Forms A and B), Fall 1933; Kuhlman-An- 
derson, Spring 1935; Terman Group (Form 
A), Fall 1935; Terman Group (Form B), 
Fall 1935; Terman Group (Form B), Spring 
1936. 

Achievement tests (New Stanford through- 
out).—Administrations were regular and uni- 
form. For the higher grade, sub-tests 4, 5, 6, 
7 (V and W) were given each year in the 
Spring and 1, 2, 9, 10 (V and W) each year 
in the Fall. For the lower grade, sub-tests 
4, 5, 6, 7, ( V and W) were given each year 
in the Fall and 1, 2, 9, 10 (V and W) each 
year in the Spring. 


DEFINITIONS AND ABBREVIATIONS 


For convenience and clarity the following 
are used: 

TGT,,s — Terman Group Test, 1933 ad- 
ministration, combined forms of A and B 
(not average). Outcomes for the separate 
forms were not available due to some particu- 
lar circumstances in test administration. This 
information was contained in a note attached 
to the scores. 

TGT, — Terman Group Test, second ad- 
ministration, Form A only. 

TGT;, — Terman Group Test, third ad- 
ministration, Form B only. 

TGT;, = Terman Group Test, fourth ad- 
ministration, Form B only. 

KA, and KA, = first and second admin- 
istrations of Kuhlman-Anderson. 

AQ = EQ divided by IQ. The accomplish- 
ment quotient is used in a general sense to 
mean the measure of relationship between in- 
telligence and achievement. When the discus- 
sion concerns some particular method, the 
reader will be so informed. 

Pintner’s Difference Method = EQ stand- 
ard score minus IQ standard score. 

RAQ = Regression Accomplishment Quo- 
tient as expounded in particular by Cureton*™* 
== EQ divided by estimated EQ. 

Higher Grade and Lower Grade — terms 
used to clarify the reader’s thinking about 

* Cureton, op. cit. 
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the time of test administrations and to bring 
the achievement measures uniformly close to 
the intelligence measures in the matter of 
time. After the AQ’s were made up, they were 
thrown into single grade distributions and so 
treated throughout the rest of the study. 


TREATMENT OF THE DATA: THE 
ANALYTICAL APPROACH 


Preliminary analysis and evaluation of the 
measures: (1) of intelligence and (2) of 
achievement.—Two questions present them- 
selves at the outset of a study such as the 
present one: (1) How reliable are the meas- 
ures? (2) How consistent are these measures 
from year to year? An attempt was made to 
answer these questions in the following man- 
ner and with the designated results: 

Reliabilities of intelligence measures —By 
Spearman-Brown Formula and raised— 
CAVD = .93 + .0069; KA, = .96 + .004, 
KA, = .96 + .004, used same as KA, since 
notebooks were not available for KA,. By 
alternate form — TGT (all single forms) — 
.93 + .0069; TGT,,, = .96 + .004 (double 
the length of the single forms). 

Reliabilities of achievement measures (all 
New Stanford Achievement and alternate 
form reliability ).— 
Year of 1932 = .93 

whole test = .96 
Year of 1933 = .94 
whole test = .97 
Year of 1934 = .94 
whole test = .97 


Year of 1935 = .95 
whole test = .97 


.0069 (V with W); 
.004 
.0059 (V with W); 
003 
= (V with W); 
.005 (V with W); 
003 


Ht ie Ht ig eg 


Intercorrelations of intelligence data are 
given in Table I. The mean is .82. Intercor- 
relations of achievement data were not made 
separately for V and W. Since the same 
achievement test is used throughout, and since 
the correlations between V and W are already 
stated and fairly high, this seemed unneces- 
sary. The mean inter r of the combined V 
and W is .92 and coefficients range from .89 
— 1932 with 1935) to .94 (year 1933 with 
1934). 

The AQ distributions—Sixteen distribu- 
tion of AQ’s were computed. Eight used 
Form V of achievement and eight used Form 
W. To be able to utilize the composite 
achievement measure it was necessary in each 
instance to accept four of the eight subtests 
at approximately one semester interval from 
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TABLE I 


INTERCORRELATION OF INTELLIGENCE QUOTIENTS, 
BASED ON 171 CASES 


Mean intercorrelation (raw) = .82; (corrected) = .85 

Means=TGT B: 110; TGT B? 111; TGT A 110; 
KA: 104; TGT A+B 112; CAVD 108; Stan- 
ford Binet 106; KA: 102. 

Medians = TGT B: 111; TGT B: 112; TGT A111; 
KA: 105; TGT A+B 111; CAVD 110; Stan- 
ford Binet 105; KA: 101. 

Sigmas=TGT B: 14; TGT B: 18; TGT A 13; 
KA: 11; TGT A+B 12; CAVD 13; Stanford 
Binet 12; KA: 11. 


the intelligence test of the team.’* The situa- 
tion is uniform, however, and so can have 
no detrimental influence. The alternative was 
to break the achievement measures into two 
parts, which would have resulted in a sub- 
ject emphasis with one half predominantly 
verbal and the other unduly influenced by 
arithmetic. Such a situation was undesirable, 
since a subject emphasis leads to the ques- 
tion of special abilities or disabilities and is 
outside the purpose of the present undertak- 
ing. Moreover, because EQ and IQ rather 
than EA and MA are used, the separation of 
the two groups of sub-tests by one semester 
of time is of no consequence anyway. 

Correlations and _ intercorrelations——The 
AQ distributions using Form V of achieve- 
ment were correlated with those using Form 
W. Results are given in Table II. The mean 
r is .82 and coefficients range from .76 to .gr. 
The same intelligence scores are used with 
both forms of achievement. 

Distributions using Form V of achievement 
were intercorrelated (see Table III). These 
results sustain Chapman’s contention of a 
long time ago, to the effect that there is little 
relationship between AQ’s computed with dif- 
ferent but supposedly comparable sets of 
measures.’ The mean inter r is only .35 and, 

14 Exact combinations of EQ and IQ are given in full in 
Appendix I. 

“The a ey HA the Differ- 
ives 


J. Crosby 
ence between Intell: ct ara Educa tings’’, 
XIV (February, 1923), 


of Educational Psyc 
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TABLE IT 


ForM V ACHIEVEMENT QUOTIENTS CORRELATED 
WiTH ForM W ACHIEVEMENT QUO- 
TIENTS, BASED ON 163 CASES* 


Correlation of 
AQ’s from V 
and W of 


Intelligence Tests achievement 


1 
2.8 
3. 

4 

5 


* Throughout the study the AQ distributions are 
designated by the intelligence test of the team. 


TABLE III 


INTERCORRELATION OF ACCOMPLISHMENT QUO- 
TIENTS USING FoRM V oF ACHIEVEMENT 
ONLY, BASED ON 163 CASES 


if @ 4° #& €. 7 


.49 .30 .58 .380 .08 .11. 
.81 .88 .89 .18 .14. 
. Stanford 


Binet Team.......... .88 .46 .2 .18. 
— * Sas .383 .14 .20. 
. TGTA+B 

SRD AES eae ee 34 .24. 
TGTA 

EE a See See eC 59. 
. TGTB 

Es vvcin oak Gahdk demented ; 
. TGTB: 


Tests 


. KA: Team_-_-_- 
. CAVD Team__-_-_-_- 


os Qa OF one 


Mean Intercorrelation =. 35 


Means = KA: 106; CAVD 100; Stanford Binet 101; 
KA: 102; TGT A+B 96; TGT A 98; TGT B: 
92; TGT B:91. 

Medians=KA' 106; CAVD 1038; Stanford Binet 
100; KA: 102; TGT A+B 96; TGT A 98; 

_ TGTB:92;TGT B:92. 

Sigmas=KA: 9; CAVD 7; Stanford Binet 8; 

te NBs TGT A+B 6; TGT A6; TGT B: 5; 
2 5. 


no doubt, would be considerably lower but 
for the fact that four of the eight intelligence 
tests are the TGT, two the KA, and that four 
distributions of EQ’s are shared between the 
eight distributions of IQ’s. 

A particularly clear-cut case showing the 
definitive power of the intelligence test can be 
had in the coefficient of .45 between the SB 
and the TGT,,», teams. This relationship of 
.45 is such an improvement over some of the 
other outcomes as to make it seem rather 
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hopeful at first glance. But exactly the same 
achievement scores are used in the two teams. 
There can be no difference at all in pupil 
effort and teaching effectiveness. Motivation 
is eliminated entirely as a consideration be- 
cause achievement scores are identically the 
same in the two distributions. Only the in- 
telligence tests vary, and even they were ad- 
ministered within the same semester, so that 
time is no factor either. Still the two AQ dis- 
tributions correlate only .45. 


What is true entirely of the TGT,,, and 
SB is true in part of other combinations. 
KA, and CAVD teams have 50 per cent of 
their achievement scores common, as do also 
the CAVD and the SB. Likewise, all single 
forms of TGT use identical achievement. 
These conditions contribute to make the mean 
inter r of .35 unduly high. 


When achievement scores are not identical 
and the intelligence tests vary, most of the 
coefficients are so low as to make it unsafe 
to conclude that any relationship exists. The 
scanty r’s become even more amazing when 
one takes account of the correlations between 
the measures that contribute to the AQ. For 
example, the IQ’s in the CAVD and TGT, 
teams correlate .79; the EQ’s correlate .93; 
but the AQ’s show only .13+ .o51. Again, 
the IQ’s in the KA, and the TGT ,, teams 
correlate .80; the EQ’s correlate .89; but the 
AQ’s show only .11 + .047. 


Most startling of all is the r of .076 + .052, 
as shown between the KA, and the TGT, 
teams. Can this actually mean that there is 
no demonstrable relationship whatsoever be- 
tween the motivation of pupils in the two 
periods? Or does it mean, more likely, that 
influences other than motivation are at work 
to reduce this coefficient? Unreliability of the 
instruments of intelligence and achievement 
can hardly be held responsible, since the KA, 
shows a reliability of .96 and the TGT, of 
.93, while the corresponding educational 
scores show .93 and .g5 respectively. Nor 
can time be entirely to blame, though it must 
be admitted that four years intervene be- 
tween the AQ distributions and that the EQ 
relationship is lower for years 1932 and 1933 
than for the other years. Still there is a cor- 
relation of .89 between these two EQ distri- 
butions, and that inclines one to look a little 
further for the explanation of the negligible 
AQ relationship. 
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The most potent influence appears to be 
rooted in the measuring instruments them- 
selves. One gets an inkling of the extent of 
irregularity from the variety of mean-sigma 
patterns in each team and between the teams. 
KA, has a mean IQ of 105 and o of 11. 
TGT, has a mean IQ of 111 and a o of 13. 
Teamed with the IQ distribution of KA, is 
an EQ showing a mean of 111 and a o of 
14; with the TGT, an EQ showing a mean 
of 102 and a o of 13. 

By definition the AQ is at unity when the 
child’s achievement equals his capacity to 
achieve. This is true whether his scores are 
at + 3 ¢ or — 3 o so long as he measures 
alike in both traits. But what do we find em- 
pirically with respect to the KA, and the 
TGT, teams? The answer is given in 
Table IV. 


TABLE IV 


AQ’s RESULTING AT THE SEVEN PRINCIPAL 
POINTS IN THE KA; AND THE TGT, 
TEAMS, BASED ON 163 CASES 


KA: Team TGT A Team 
111 
109 
108 


Location 


In the KA, six of the seven designated 
points are at 100 or above; in the TGT, not 
a single 100 occurs. The highest AQ in the 
TGT,, is lower than the lowest in KA,. It can 
be expected, of course, that some students will 
reach the roo mark in TGT, because pupils 
do not ordinarily score at the same o posi- 
tions in both EQ and IQ. Still the indica- 
tions are plain. One can safely assume that 
the mean AQ in the KA, team will be high 
and the o large, while the mean in the TGT, 
will be low and the o small. Reference once 
more to Table III bears out this assumption. 
The mean AQ of the KA, team is 106, and 
the o is 9 with a range from 81 to 129. The 
mean AQ of the TGT, team is .93; the o is 
6 with a range from 66 to 108. Out of a 
possible 163 pupils, 126 have AQ’s at 100 or 
above in the KA, team and only 17 in the 
TGT, team—a shocking disparity if the AQ 
is actually a measure of pupil effort and teach- 
ing effectiveness, which seems improbable 
despite the lower achievement relationship 
for the two years. 
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The improbability is sustained in the num- 
ber of perfect AQ’s (at roo or above) for 
each of the eight distributions. These are 
given in Table V. Only nine pupils are work- 
ing up to capacity at the last testing period. 
The rest have dropped below the horizon of 
effort, even though the mean EQ is above 
the 100 mark (102). 


TABLE V 


NUMBER OF AQ’s AT 100 oR ABOVE IN EACH 
TEAM, BASED ON 163 CASES 


Number of 
Year of AQ’s at 
Experiment 100 or above 


lst 
Ist and 2nd 
2nd 


It is worth noting, however, that unified 
AQ’s are characteristically few for TGT and 
many for KA irrespective of where they oc- 
cur in the experiment. The mean achievement 
for KA, is not significantly higher than that 
for TGT,, (104 and 102 respectively), yet 
the number of unified AQ’s drops from 98 to 
9. The mean achievement for TGT,,», is 
three points higher than that for KA, (104 
and 107 respectively), yet it shows only 39 
unified AQ’s as compared with 98 for KA,. 
The SB and the TGT,,» offer conclusive evi- 
dence of the definitive power of the tests of 
intelligence irrespective of achievement, be- 
cause they are using the same achievement 
distribution and still show a difference of at 
least fifty in the number of pupils who are 
working up to capacity. This difference 
comes about by merely changing from one 
standardized test to another in measuring in- 
telligence. There can be no actual difference 
in intelligence, because the tests were admin- 
istered very near to each other in time. Nor 
can there be any change whatsoever in the 
work being done either by pupil or teacher, 
and yet there is a decisive change in the 
standing of at least fifty members of the 
group. 

It is this exaggeration of error in the ordi- 
nary AQ that has prompted the effort of va- 
rious students to improve the technique. The 
RAQ is often suggested to take the place of 
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the AQ because it is thought to be free of this 
exaggeration of error and more nearly what 
the ordinary non-technical worker believes 
the AQ to be. The present investigation, 
therefore, was extended to the RAQ. 


REGRESSION ACCOMPLISHMENT QUOTIENTS 


With exactly the same scores and in ident- 
ical combinations as those used for Form V 
of the AQ’s, Regression Accomplishment Quo- 
tients were computed and the eight distribu- 
tions intercorrelated. Results are shown in 
Table VI. In keeping with the claims made 
for the RAQ, the means are now practically 
all at unity. SB and KA, are the only excep- 
tions and even these are not more than one 
point removed. In keeping further with ad- 
vantages claimed, we find ideal correlations 
of RAQ with IQ and with EQ—negative and 
low, or better still, no relationship at all with 
IQ and positive but low as possible with EQ 
(See Table VII). But these ideal correlations 
apply equally well to AQ (See Table VII 
also). If there is any advantage, it favors 
the AQ rather than the RAQ. Both correlate 
ideally with IQ; both correlate fairly satis- 
factorily with EQ, being a little high in 
either case for most of the tests and higher 
concerning the RAQ. At the bottom of Table 
VII is a composite distribution made from 
the four-year measures. It is inserted in the 
table to be used as a standard with which to 
compare the single distributions. 

Since the RAQ renders the means equal 
and equalizes the number of ideal quotients 
in each team, the intercorrelations become 
particularly interesting. One naturally wishes 
to know what improvement has been made 
between the teams (Table VI). It is a little 
disappointing to find, however, that the mean 
inter r for the RAQ’s is only .39 as compared 
with .35 for the ordinary AQ—an improve- 
ment of only 4 points. The .39 is still well 
below chance and subject to all the discounts 
previously pointed out with reference to the 
AQ. In so far as these data are representa- 
tive, it must be said of the RAQ as of the 
AQ that there is little comparable meaning 
between quotients computed with different 
but supposedly comparable sets of measures. 


PINTNER’S DIFFERENCE METHOD 


The RAQ modification succeeded in equal- 
izing the means of EQ and estimated EQ 
within each team, but it left them still un- 
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TABLE VI 


INTERCORRELATIONS OF REGRESSION ACCOMPLISHMENT QUOTIENTS USING FORM V 
or ACHIEVEMENT ONLY, BASED ON 163 CASES 


Intelligence Tests 
. KA 
Estimated EQ 
. CAVD 
Estimated EQ______. 


. Stanford Binet 
Estimated EQ 
KA: 


Estimated EQ 

. TGT A+B 
Estimated EQ_--- 
TGTA 


" Estimated EQ.--- 
. TGTB 
Estimated EQ_-_- 


. TGT B: 
Estimated EQ 


3 4 5 


-61 26 
. 35 


Mean r = .39 
Means = KA: 100; CAVD 100; Stanford Binet 99; KA: 99; TGT A+B100; TGT A100; TGT B: 100; 


TGT B: 100. 


Medians = KA: 100; CAVD 100; Stanford Binet 99; KA: 99; TGT A+B 101; TGT B: 101; TGT B: 101. 
Sigmas = KA: 8; CAVD 7; Stanford Binet 8; KA: 9; TGT A+B 6; TGT A 6; TGT B: 6; TGT B:6. 


TABLE VII 
CORRELATION OF RAQ AND OF AQ WITH IQ AND WITH EQ, BASED ON 163 CASES 
RAQ with IQ RAQ withEQ AQwithIQ AQ with EQ 
‘ > . 0038 . 57 


improved between the teams. KA, for ex- 
ample has a pair at 111, TGT, at 1o2, and 
the others vary between. The sigmas were 
not equalized, however, either within the 
teams or between them. Because the Pintner 
method equalizes both means and sigmas 
throughout all distributions, it was thought 
advisable to apply it to these data in order 
to discover to what extent this equalization 
increases the intercorrelations. 

The Pintner method proved laborious, 
however, and other findings, soon to be pre- 
sented, so completely invalidated the AQ con- 
cept, difference method and quotient methods 
alike, that the Pintner formula was not 
applied as extensively as the other two. Only 
four teams were converted—TGT,, CAVD, 
SB, and KA,. To the degree that these four 
are representative of the whole intercorrela- 


tion table, it may be concluded that the im- 
provement is negligible, amounting on the 
average to less than one point over the 
ordinary AQ. 

Apparently the real difficulty is not in the 
technique but in the tests, and any modifica- 
tion that does not deal directly with the tests 
themselves is merely “hacking at the branches 
and ignoring the roots” of the difficulty. A 
teacher can administer the KA one day and 
go home well pleased with himself and his 
class; the majority are working up to capac- 
ity. He can administer the TGT the next day 
and find a very different situation; his own 
contributions and those of practically every 
member of the group are under suspicion. 

Studies in motivation and personality ad- 
justment are popular in education at the 
present time. The measure of motivation is 
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usually accepted as the numerical difference 
between intelligence scores and achievement 
scores, and intelligence is frequently measured 
by a single test such as the American Council 
Psychological Examination or the Thorndike 
College Entrance. One is forced to be 
skeptical of findings and conclusions based 
upon these single samplings. Since motivation 
is such a variable quantity by means of dif- 
ferent instruments, how can we know which 
measures to credit? 

The common answer in theory, if not in 
ordinary practice, is to credit a combination 
of them all. This, of course, will increase the 
reliability of the AQ. But to what degree 
that reliability is increased and whether un- 
reliability is the only objection to the AQ 
procedure are points of attack in the next 
part of the study. 


TREATMENT OF THE DaTA: THE 
SYNTHETIC APPROACH 


Subsidiary questions pertaining to combin- 
ing the measures—The question naturally 
arises as to the legitimacy of combining 
measures over a _ four-year period. This 
seemed reasonably justified respecting the 
achievement scores, because the same test 
was used throughout, showing a mean year- 
to-year consistency of .92 and a mean reli- 
ability of .97. Combining the measures of 
intelligence was less obviously warranted, 
since four different tests were employed and 
administrations were not uniform and regular 
as was the case with achievement. Fortu- 
nately KA and TGT were repeated so that 
comparisons could be made to ascertain 
whether the IQ’s were fluctuating reliably 
more over a two and one-half-year period 
than a half-year (24 years represent the wid- 
est span that could be checked). These out- 
comes show: 

KA 1985 with TGT 1933 (2% yre) = =-3 

KA 1935 with TGT 1935 = 


KA 1933 with TGT 1935 (oie or a = 
KA 1933 with TGT 1988 ( % yr.) =.77 


While it is true that 82 for a % span 
is considerably higher than .73 and a little 
higher than .79 for 2% years, there is also a 
difference when the span of time is the same 
—.77 and .82 for % year and .73 and .79 
for 24% years. It seemed reasonable to con- 
clude, therefore, that the tests themselves 
(more than the element of time) were ac- 
counting for the difference and that weight- 
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ing to reduce intra-semester influence was not 
necessary. Previous work with these data at 
the Institute shows an r of .98 between 
weighted and unweighted scores and serves 
= to make weighting seem hardly worth 
while. 


Converting the several intelligence distribu- 
tions into a master distribution—For the 
sake of simplicity in the formula work that 
is to follow, it seemed best to adapt all in- 
telligence scores to a master or standard dis- 
tribution with a mean equal to the average 
mean of all the tests and a sigma equal to the 
average sigma. The conversion was made fol- 
lowing the method of Conrad,’* a simple and 
straightforward process that manipulates only 
the non-standard distribution, requires no 
deviations from the mean to be computed as 
with Woodworth’s method,’* and makes no 
assumptions whatsoever except such as are 
implied in the use of any standard distribu- 
tion, mean, or sigma. Because CAVD and 
TGT~,,» are tests of double length, they were 
given double weight in the conversion. 


Estimated reliability of the four-year aver- 
ages——The Spearman-Brown Prophecy 
Formula was used in obtaining the estimated 
reliability of the four-year measure of 
achievement. The formula is permissible in 
this instance because of the similarity of the 
sub-tests in the battery as shown by their 
fairly high inter r of .92. The mean r of 
Form V with Form W was .94; of the whole 
test .97; and of four lengthenings of the test 
as follows: 


4 X -97 
1+ (4—1) .97 
ability of the four-year measure of achieve- 
ment). 





or .99 (estimated reli- 


The prophecy formula is not applicable to 
the intelligence scores because the battery is 
made up of sub-tests quite different in meas- 
uring capacity (mean inter r — .82). In- 
stead, Spearman’s Formula for the r of Sums, 
as expounded in particular by Douglass and 
Cozens,’* was used. In order to state the 
formula simply we may let: 

16 Herbert S. Conrad, 


“The Adjustment of Froguenty Dis- 
tributions”, Journal of Educational Psychology, (May, 
1930), 386-7. 


17R. S. Woodworth, “Combining the Results of Several 


Tests: A Study in Statistical Method’’, Psychological Review, 
XIX (March, 1912), 97-123. 
18H. R. Douglass and F. 
Estimating the Reliabilit; 
Educational Psychology, 


W. Cozens, “On Formula for 
of Test Batteries”, Journal of 
(May, 1929), 369-377. 
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c = CAVD ; k, — KA, ; s = SB 5 t, = 
TGTa,s 

k, = KA, ; t, = TGT, ; t, — TGTp, ; 
t, — TGTp, 


and let the alternate forms of these tests be 
designated by capital letters matching the 
small ones as: C, K, S, etc. Also, let “p” and 
“q” be general designations for c, k, s, etc., 
and “P” and “Q” be general designations for 
C, K, S, etc. The formula may then be 
written: 
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total four-year testing period. The equiva- 
lence of ten samplings of intelligence go to 
make up this measure, and the achievement 
test represents eight separate school subjects 
in every instance. Such a sampling is almost 
prohibitive in the ordinary school situation, 
and still the reliability of this component AQ 
leaves much to be desired. Moreover, reli- 
ability is not the only point to be considered. 
What of the validity of the composite AQ? 
The combined EQ distribution was cor- 
related with the combined IQ distribution 


r(c+k,+s+t+k, +t,+t,+t,) (C+K,+8+T, +K,+T,.+T; + T.) = 


64r 
S pQ 
I 


56 56 
8+Srpq 8+SrPQ 
I I 








reliability of the four-year measure of intel- 
ligence). 

Of the 92 correlations involved in this 
formula only one was estimated outright— 
s with S. The reliabilities and the intercor- 
relations have already been presented in the 
preliminary analysis and evaluation of the 


measures and, therefore, will not be restated 
here. The computational work of the formula 
is greatly simplified by the fact that all in- 
telligence scores are in the form of a master 
distribution and all sigmas are equal. 

Since the combined measure of intelligence 
shows a reliability of .99 and the combined 
measure of achievement shows a reliability 
of .99, we can proceed to the matter of the 
composite AQ with the assurance that we are 
working with highly reliable instruments 
which will exhibit the AQ under optimum 
conditions. 

Reliability of the composite AQ.—Knowing 
the reliabilities of the composites IQ and EQ, 
it was now possible to compute the reliability 
of the composite AQ. For this purpose Hol- 
zinger’s formula for the correlation between 
ratios was used. If we write EQ as e, IQ as i, 
and V as o/M, we may apply this formula as: 


= .99 (estimated 


and showed a coefficient of .94 (corrected 
.95). The validity of the AQ concept must 
depend upon the clearness of distinction be- 
tween the two measures implied, and it seems 
that this distinction, in the present case at 
least, is not very clear. Here are only five 
points with which to differentiate intelligence 
and achievement, and even these five may not 
be measuring accurately. It supports empir- 
ically the theoretical findings made by Kelley 
who holds that there is a correlation of .95 
between our “so-called measures of intelli- 
gence and achievement”. In fact, this em- 
pirical finding even surpasses Kelley’s theo- 
retical one, for it uses a grade range, whereas 
Kelley uses an age range. Moreover, Kelley’s 
age range is a very liberal one—equivalent to 
six grades, III to VIII inclusive. 
Correlation of each intelligence distribution 
with each achievement distribution—It was 
now the purpose of the investigation to re- 
turn to the individual distributions in order 
to compare the degree of overlapping between 
intelligence and achievement with respect to 
each of the various intelligence tests. Sixty- 
four correlations are involved in this com- 
parison, thirty-two with Form V of achieve- 


r (<)()- [Tee V. Ve + Fir Vi Vi —Tei V. V; om Fig Vi V.] — 
1 
(Vo? + Vi? — 2rer Ve Vi) 44(Ve? + Vi ?— 2fe: VeVi) 24] = .84 


(reliability of the composite AQ). 
Here, then, is the very moderate reliability 
of only .84 for the AQ resulting from the 


ment and thirty-two with Form W. The co- 
efficients are given in Table VIII. The great- 
est amount of overlapping is between the 
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THE ACCOMPLISHMENT QUOTIENT 


TABLE VIII 


RAW CORRELATIONS OF EACH INTELLIGENCE DISTRIBUTION WITH EACH 
ACHIEVEMENT DISTRIBUTION, BASED ON 163 CASES 


Intelligence Tests 


No.3 
. 197 
. 761 
. 754 
. 736 
. 874 
. 885 
. 889 
. 883 


0 IS GU ym G2 POP 


Mean Intercorrelation = .828 (corrected = .883) 


Terman Group and New Stanford Achieve- 
ment. The least is between Kuhlman- 
Anderson (second administration) and New 
Stanford Achievement. 

Table IX gives comparatively the mean 
correlation of each intelligence distribution 
with each of the other intelligence distribu- 
tions and of each intelligence distribution 
with each achievement distribution. Except 
for the KA, there is greater overlapping be- 
tween intelligence and achievement than be- 
tween intelligence inter se. KA, overlaps as 
much with achievement as with intelligence 
and KA, only slightly less. The mean over- 


lapping of intelligence and achievement is 


.83 (corrected 88), actually a little 
greater for intelligence with achievement 
than for intelligence with intelligence.*® 

It seems useless to talk of the measured 
difference between capacity and achievement 
when the instruments designed to measure 
these two traits simply do not distinguish 


1% Using the 1916 Revision of Stanford Binet, the Otis 
Group Intelligence Scale, Advanced Examination, and Form V 
of the New Stanford Achievement, Advanced Examination, 
Cureton finds a raw mean r of .87 between intelligence and 
achievement as compared with .87 for intelligence inter se. 


Achievement Distributions 


Forms W 
No.4 No.1 No.2 No.3 
. 760 . 791 


. 793 . 167 
. 794 . 823 


. 818 . 785 
. 753 . 743 . 776 . 153 
. 731 . 718 . 739 . 718 
. 857 . 863 . 876 . 871 
. 902 . 883 . 890 - 919 
- 902 . 885 


. 897 . 904 
. 903 . 858 . 882 - 901 


No.4 


. 781 
. 782 
. 757 
. 724 
. 834 
. 880 
. 900 
. 875 


between them. Such a situation destroys auto- 
matically the accomplishment quotient con- 
cept and robs it of meaning and utility. 

Here, at last, it seems we have an explan- 
ation for the absurdly low correlations be- 
tween TGT and KA, since the former shows 
the greatest amount of overlapping with the 
New Stanford Achievement and the latter 
shows the least. We can see, also, why the 
RAQ was unable to improve the intercorrela- 
tions to any appreciable degree. Because the 
intelligence scores are as Closely related to 
achievement as they are to intelligence, we 
can hardly expect that EQ divided by esti- 
mated EQ (estimated from IQ) will differ 
greatly from EQ divided by IQ. 

The mean correlation of .83 for the single 
distributions of IQ with those of EQ becomes 
.88 when corrected for attenuation (Table 
VIII). This figure falls a little short of the 
correlation between the composites EQ and 
IQ. There is a deficiency of nearly .06. It is 
believed, however, that the discrepancy is 
owing in part, at least, to the over-statement 
of some of the test reliabilities, especially 


TABLE IX 


MEAN CORRELATIONS OF EACH INTELLIGENCE DISTRIBUTION WITH EACH INTELLIGENCE DISTRIBU- 
TION AND OF EACH INTELLIGENCE DISTRIBUTION WITH EACH ACHIEVEMENT 
DISTRIBUTION, BASED ON 163 CASES 


IQ with IQ 


IQ with IQ IQ with EQ IQ with EQ 





Intelligence Tests 


(corrected ) 
. 957 
. 946 
. 951 
. 828 
.914 
. 854 
. 827 
. 763 
. 883 
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those obtained by the split halves technique. 
The correction for attenuation formula as- 
sumes alternate form reliabilities. When 
pupils are tested on one occasion only, tem- 
porary influences do not cancel out as is the 
case with different testings. Instead, they 
tend to be cumulative, to all work in the same 
direction, and to increase correlations unduly. 


SUMMARY AND CONCLUSIONS 


1. The AQ is a distinctly unreliable 
measure from one set of tests to another, 
showing a relationship so low in most cases 
as to make it unsafe to conclude that any 
exists. This is true regardless of the technique 
employed. The factor of motivation is ren- 
dered largely subordinate to the definitive 
power of the tests. There is a mean inter r 
of only .35 for the eight AQ distributions 
despite the fact that four of the intelligence 
tests are the TGT, two the KA, and that the 
four different EQ distributions supply the 
team scores for eight different IQ distribu- 
tions. Obviously this .35 would be consider- 
ably less if the achievement scores varied in 
each instance and no intelligence tests were 
repeated. The number of ideal quotients (100 
or above) is similar whenever the same intel- 
ligence test is employed and is often shock- 
ingly divergent when different intelligence 
tests are used, being characteristically few 
with TGT and many with KA wherever these 
occur in the experiment and despite the mean 
EQ. Large numbers of pupils achieve up to 
capacity or fall below capacity without the 
slightest change in their actual work by 
merely changing from one accepted test to 
another in measuring intelligence. 


2. The combined four-year achievement 
measure shows a reliability of .99 and the 
combined four-year intelligence measure 
shows a reliability of .99. The component 
AQ, however, shows a reliability of only .84. 
With almost perfectly reliable instruments, 
the AQ still leaves much to be desired. 


3. The combined EQ distribution correlates 
.94 (corrected = .95) with the combined IQ 
distribution, Jeaving a difference of only five 
points with which to distinguish the two 
traits and even these five may not be meas- 
uring accurately. This is empirical corrobora- 
tion of Kelley’s statistical findings and forces 
one to agree with Kelley that the AQ is prob- 
ably safest in the discard. 
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4. The individual IQ distributions show a 
mean inter r of .82 and a mean r with 
achievement of .83. Every IQ distribution 
except the KA, correlates as highly or more 
highly with achievement as with the other 
measures of intelligence, and even with the 
KA, the difference is less than one point. 
TGT overlaps appreciably more with New 
Stanford Achievement than with the other 
measures of intelligence. Again, these findings 
are sufficient in themselves to invalidate the 
AQ and rob it of meaning and utility. The 
overlapping of the measures of intelligence 
and achievement serves to introduce serious 
error into the concept, including Pintner’s 
method and the quotient methods alike. 


DISCUSSION 


If the degree of communality between in- 
telligence and achievement with the present 
findings exceeds those of the average, it does 
not necessarily follow that the present ones 
are too high. These relationships may be a 
truer picture than is usually obtained. The 
sampling is large, the measures are highly re- 
liable, and the tests are some of our best— 
only SB having undergone improvement since 
this study was made. 

There is a situation or two that may pos- 
sibly contribute to increase the IQ, EQ rela- 
tionship, and these should be mentioned here. 
First of all, there is the comprehensive nature 
of the New Stanford Achievement test. It 
represents a two-hour sampling and covers 
eight of the common school branches. The 
intelligence tests are much shorter, varying 
in length from thirty minutes to one hour. 
This difference in length tends to suggest an 
advantage for EQ, IQ over IQ inter se. 

In the second place, it is possible that the 
Claremont junior high school psychology 
leans towards the “progressive persuasion.” 
To the extent that this is true, pupils will 
lack drill in the common school branches and 
thereby tend to score more similarly in the 
two types of tests. 

A third point, and one that will no doubt 
be interpreted according to one’s definition 
of intelligence, is the possibility that the stu- 
dents of the present experiment are “test 
wise.” Certainly they are tested frequently 
and are very much at home in the testing 
atmosphere and with test materials. The AQ 
concept rests upon the assumption that the 
intelligence test is not susceptible to training, 
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but that the achievement test responds deli- 
cately to changes in the environment. There 
are studies in the literature, however, scien- 
tifically demonstrating that the types of 
material usually found in intelligence tests 
can be taught and scores markedly increased 
thereby. Also, it is generally conceded that 
forty per cent of our male population in the 
last World War were rated feebleminded by 
the Army Alpha and that the rating was un- 
fair to adults, in view of the fact that the 
test was standardized on school children. 
Adults were handicapped in performing the 
kind of tasks required in the test because they 
were out of practice with school tasks. If 
lack of practice with school tasks can serve 
to lower intelligence scores, perhaps excessive 
practice can serve to increase them. 
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Be these things as they may, the AQ came 
into use on the assumption that intelligence 
is general and that achievement is always 
based upon the same kind of intellectual 
capacity. After years of advanced study, 
however, our factor analysts are not at all 
sure that intelligence is general or how much 
of it is general and how much is specific. Per- 
haps, when “primary abilities” and “group 
factors” are more definitely defined, we may 
be able to measure intelligence with increased 
accuracy and clearness. Until then, however, 
and while we have only our present instru- 
ments, we are forced to be skeptical of 
research studies, as well as other educational 
procedures, which continue to treat the ac- 
complishment quotient as a precise statistical 
measure. 





JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 10, No.1 


APPENDIX I 


Tue Various ComBINATIONS OF EQ/IQ For THE SIXTEEN DistRIBUTIONS oF AQ 
(For every combination of EQ/IQ listed below, two distributions of AQ’s were com- 


puted, one using Form V of achievement and one using Form W) 
Lower Grade 


. EQ = New Stanford Achievement, Fall 1932 and Spring 1933. 
IQ = Kuhlman—Anderson (KA,), Spring 1933. 


. EQ = New Stanford Achievement, Fall 1932 and Spring 1933. 
IQ = CAVD, Fall 1932. 


. EQ = New Stanford Achievement, Fall 1933 and Spring 1934. 
IQ = Stanford Binet, Fall 1933. 


. EQ = New Stanford Achievement, Fall 1933 and Spring 1934. 
IQ = Terman Group Test (TGT,,»), Fall of 1933. 


. EQ= New Stanford Achievement, Fall 1934 and Spring 1935. 
IQ = Kuhlman—Anderson (KA,), Spring 1935. 


. EQ= New Stanford Achievement, Fall 1935 and Spring 1936. 
IQ = Terman Group (TGT,), Fall 1935. 


. EQ= New Stanford Achievement, Fall 1935 and Spring 1936. 
IQ = Terman Group (TGTs3,), Fall 1935. 


. EQ = New Stanford Achievement, Fall 1935 and Spring 1936. 
IQ = Terman Group (TGT;,), Spring 1936. 
Higher Grade 


. EQ = New Stanford Achievement, Spring 1932 and Fall 
IQ = Kuhlman—Anderson (KA,), Spring 1932. 


. EQ= New Stanford Achievement, Spring 1933 and Fall 
IQ = CAVD, Spring 1933. 


. EQ = New Stanford Achievement, Spring 1933 and Fall 


IQ = Stanford Binet, Spring 1933. 


. EQ = New Stanford Achievement, Spring 1933 and Fall 
IQ = Terman Group (TGT,,s), Spring 1933. 


. EQ= New Stanford Achievement, Spring 1934 and Fall 
IQ = Kuhlman—Anderson (KA,), Fall 1934. 

. EQ =: New Stanford Achievement, Spring 1935 and Fall 
IQ = Terman Group (TGT,), Spring 1935. 

. EQ= New Stanford Achievement, Spring 1935 and Fall 
IQ = Terman Group (TGTs,), Spring 1935. 

. EQ = New Stanford Achievement, Spring 1935 and Fall 
IQ = Terman Group (TGTs,), Fall 1935. 








